Last Updated on September 29, 2023

You can solve matrices of linear systems of equations in numpy in parallel using multithreaded implementations of the algorithms.

In this tutorial, you will discover how to solve systems of linear equations using multithreaded numpy functions.

Let’s get started.

## Numpy Matrix Solvers are Multithreaded

Linear algebra is a field of mathematics concerned with linear equations with arrays and matrices of numbers.

Numpy is a Python library for working with arrays of numbers. As such, it implements many linear algebra functions in the **numpy.linalg** module.

Included are functions for solving systems of linear equations.

Many of the **numpy.linalg** functions for working with matrices call down into the BLAS and LAPACK libraries. These libraries provide efficient implementations of common matrix functions. Some of these efficient implementations are multithreaded.

The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient low level implementations of standard linear algebra algorithms.

— Linear algebra (numpy.linalg)

Numpy calls down into LAPACK libraries to solve systems of linear equations. The implementation of these functions are multithreaded.

The specific solvers that are multithreaded include:

- Solve linear matrix equation:
**numpy.linalg.solve()**, which calls**_gesv**in LAPACK. - Solve linear least squares:
**numpy.linalg.lstsq()**

The BLAS and LAPACK libraries will use all CPU cores by default. Nevertheless, we can configure the libraries to use a specific number of threads, such as one thread for a single-threaded version of a function and one thread per physical CPU core, which often results in better performance.

You can learn more about BLAS and LAPACK in numpy in the tutorial:

This can be achieved using an environment variable, such as **OMP_NUM_THREADS** for OpenBLAS. Check the documentation for your BLAS library in order to discover which environment variable to set.

You can learn more about setting the number of BLAS threads in the tutorial:

Now that we know that common linear algebra matrix solvers are multithreaded, let’s look at some worked examples.

Run loops using all CPUs, download your FREE book to learn how.

## Parallel Matrix Solver

In this section, we will explore a parallel implementation of the linear matrix equation solver.

In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variables.

— System of linear equations, Wikipedia.

We can implement the matrix multiplication with NumPy via the **numpy.linalg.solve()** function.

Solve a linear matrix equation, or system of linear scalar equations. Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.

— numpy.linalg.solve API

Let’s take a closer look at a single-threaded and multithreaded version of the algorithm.

### Single Threaded Matrix Solver

In this example, we will configure BLAS to use a single thread.

We will create a square matrix with the dimensions 8,000 x 8,000 and a vector with 8,000 elements to solve the system of linear equations.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# example of single threaded matrix solver from os import environ environ['OMP_NUM_THREADS'] = '1' from time import time from numpy.random import rand from numpy.linalg import solve # record the start time start = time() # size of arrays n = 8000 # create matrix a = rand(n, n) # create result b = rand(n, 1) # solve least squares equation x = solve(a, b) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |

Running the example took about 7.070 seconds on my system.

It may take more or fewer seconds, depending on the hardware in your system and the versions of Python, Numpy, Scipy, and the BLAS library installed.

1 |
Took 7.070 seconds |

Next, we can run the same example with a multithreaded algorithm.

### Multithreaded Matrix Solver

We can update the above example to use one thread per physical CPU core in the system.

I have 4 cores, therefore, OpenBLAS is configured to use 4 threads. Set the number of threads to match the number of cores in your system accordingly.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# example of multithreaded matrix solver from os import environ environ['OMP_NUM_THREADS'] = '4' from time import time from numpy.random import rand from numpy.linalg import solve # record the start time start = time() # size of arrays n = 8000 # create matrix a = rand(n, n) # create result b = rand(n, 1) # solve least squares equation x = solve(a, b) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |

Running the example took about 3.098 seconds.

That is about 3.972 seconds faster or a speed-up of about 2.28x.

1 |
Took 3.098 seconds |

## Parallel Matrix Least Squares Solver

In this section, we will explore a parallel implementation of the linear least squares solver.

That is, using the least squares method to solve a system of linear equations.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals.

— Linear least squares, Wikipedia.

We can implement the matrix multiplication with NumPy via the **numpy.linalg.lstsq()** function.

Return the least-squares solution to a linear matrix equation. Computes the vector x that approximately solves the equation a @ x = b.

— numpy.linalg.lstsq API.

Let’s take a closer look at a single-threaded and multithreaded version of the algorithm.

### Single Threaded Matrix Least Squares Solver

In this example, we will configure BLAS to use a single thread.

We will create a square matrix with the dimensions 5,000 x 5,000 and a vector with 5,000 elements to solve the system of linear equations.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# example of single threaded matrix least squares solution from os import environ environ['OMP_NUM_THREADS'] = '1' from time import time from numpy.random import rand from numpy.linalg import lstsq # record the start time start = time() # size of arrays n = 5000 # create matrix a = rand(n, n) # create result b = rand(n, 1) # solve least squares equation x, _, _, _ = lstsq(a, b, rcond=None) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |

Running the example took about 35.347 seconds on my system.

1 |
Took 35.347 seconds |

Next, we can run the same example with a multithreaded algorithm.

### Multithreaded Matrix Least Squares Solver

We can update the above example to use one thread per physical CPU core in the system.

I have 4 cores, therefore, OpenBLAS is configured to use 4 threads. Set the number of threads to match the number of cores in your system accordingly.

The complete example is listed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# example of multithreaded matrix least squares solution from os import environ environ['OMP_NUM_THREADS'] = '4' from time import time from numpy.random import rand from numpy.linalg import lstsq # record the start time start = time() # size of arrays n = 5000 # create matrix a = rand(n, n) # create result b = rand(n, 1) # solve least squares equation x, _, _, _ = lstsq(a, b, rcond=None) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |

Running the example took about 29.909 seconds.

That is about 5.438 seconds faster or a speed-up of about 1.18x.

1 |
Took 29.909 seconds |

**Free Concurrent NumPy Course**

Get FREE access to my 7-day email course on concurrent NumPy.

Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.

## Further Reading

This section provides additional resources that you may find helpful.

**Books**

- Concurrent NumPy in Python, Jason Brownlee (
)**my book!**

**Guides**

- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes

**Documentation**

- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API

**NumPy APIs**

**Concurrency APIs**

- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks

**Overwhelmed by the python concurrency APIs?**

Find relief, download my FREE Python Concurrency Mind Maps

## Takeaways

You now know how to solve systems of linear equations using multithreaded numpy functions.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Photo by Pascal Bernardon on Unsplash

## Do you have any questions?