NumPy Parallel Matrix-Vector Multiplication

Last Updated on September 29, 2023

You can multiply a matrix by a vector in parallel with numpy.

Matrix-vector multiplication can be achieved in numpy using the numpy.dot() method, the ‘@‘ operator and the numpy.matmul() function. All three approaches call down into the BLAS library which implements the operation in parallel using native threads.

This means that matrix-vector multiplication is parallel using multithreading by default.

In this tutorial, you will discover how to perform matrix-vector multiplication in parallel using threads in numpy.

Let’s get started.

Table of Contents

Need Parallel Matrix-Vector Multiplication

Linear algebra is a field of mathematics concerned with linear equations with arrays and matrices of numbers.

Numpy is a Python library for working with arrays of numbers. As such, it implements many linear algebra functions.

One important linear algebra operation is multiplying a vector by a matrix.

For example:

1, 1, 1 1 3

[1, 1, 1] * [1] = [3]

1, 1, 1 1 3

This is a fundamental mathematical operation.

You can learn more about matrix-matrix and matrix-vector multiplication here:

Matrix multiplication, Wikipedia.

This operation can be slow when multiplying a large matrix and vector. Performance is worse as the size of the matrix and vector is increased.

As such, we need a way to make use of modern multiple CPU core systems to speed up the matrix-vector multiplication by executing it in parallel.

How can we execute matrix-vector multiplication in parallel with numpy?

Run loops using all CPUs, download your FREE book to learn how.

Matrix-Vector Multiplication is Parallel in Numpy

Matrix multiplication is multithreaded in numpy.

There are perhaps three common ways to multiply a matrix by a vector in numpy they are:

The numpy.dot() function and ndarray.dot() method.
The ‘@‘ operator.
The numpy.matmul() function.

All three of these approaches to multiplying a matrix by a vector use the underlying BLAS library.

BLAS is an acronym that stands for Basic Linear Algebra Subprograms. It is a specification for matrix and vector mathematical operations that is implemented efficiently by third-party libraries such as OpenBLAS and MKL.

You can learn more about BLAS in numpy in the tutorial:

What is BLAS and LAPACK in NumPy

Most BLAS library implementations are used by numpy multithreaded basic matrix and vector operations, such as matrix-vector multiplication.

BLAS is able to use native threads (called pthreads) independent of python threads. This means BLAS threads offer true parallelism and speed-up many linear algebra operations in numpy via multithreading behind the scenes.

Now that we know that matrix-vector multiplication is multithreaded in numpy, let’s look at some worked examples.

Start Now: Free Concurrent NumPy Crash Course

Example of Single-Threaded Matrix-Vector Multiplication

Before we explore multithreaded matrix-vector multiplication, let’s develop some single-threaded versions.

We can control the number of threads used by BLAS operations in numpy using an environment variable The specific environment variable used depends on the BLAS library installed, although the OMP_NUM_THREADS variable works in many cases.

We can set the OMP_NUM_THREADS variable to the number of threads to use in BLAS function class, such as 1 thread if we want to perform single-threaded matrix-vector multiplication.

This can be achieved before starting the Python program or within the python program prior to importing numpy via the os.environ() function.

For example:

...

from os import environ

environ['OMP_NUM_THREADS'] = '1'

You can learn more about setting the number of threads used by BLAS in the tutorial:

How to Configure the Number of NumPy BLAS Threads

We will benchmark matrix-vector multiplication with a single thread using three common approaches, the dot() method, the ‘@‘ operator, and the numpy.matmul() function.

Matrix-Vector Multiplication with dot()

We can benchmark single-threaded matrix-vector multiplication with the ndarray.dot() method.

Firstly, we will define a task function that creates a matrix and vector with a modest size. In this case, the matrix will have the dimensions 50,000×50,000, and the vector will have 50,000 elements.

Each array will be created using the numpy.ones() function.

Once created, the matrix and vector are multiplied together using the dot() method.

...

# perform the multiplication

result = matrix.dot(vector)

The task() function below implements this, creating the arrays and performing the operation and returning the time to complete in seconds.

# multiply a matrix by a vector

def task(n=50000):

# record the start time

start = time()

# create a new matrix and fill with 1s

matrix = ones((n,n))

# create the new vector filled with 1s

vector = ones((n,1))

# perform the multiplication

result = matrix.dot(vector)

# calculate and report duration

duration = time() - start

# return duration

return duration

Running the function a single time may report a time that is not representative of the time to complete the operation. This may be due to loading Python libraries or the operating system performing background tasks.

Therefore, we can run the task many times and report the average time to complete the operation.

The experiment() function below implements this, performing the task 3 times and returning the average time over the three trials.

# experiment that averages duration of task function

def experiment(repeats=3):

# repeat the experiment and gather results

results = [task() for _ in range(repeats)]

# return the average of the results

return sum(results) / repeats

Finally, we can call the experiment() function and report the average time to multiply the matrix by the vector.

...

# run the experiment and report the result

duration = experiment()

print(f'Took {duration:.3f} seconds')

Tying this together, the complete example is listed below.

# single-threaded matrix-vector multiplication with dot()

from os import environ

environ['OMP_NUM_THREADS'] = '1'

from time import time

from numpy import ones

# multiply a matrix by a vector

def task(n=50000):

# record the start time

start = time()

# create a new matrix and fill with 1s

matrix = ones((n,n))

# create the new vector filled with 1s

vector = ones((n,1))

# perform the multiplication

result = matrix.dot(vector)

# calculate and report duration

duration = time() - start

# return duration

return duration

# experiment that averages duration of task function

def experiment(repeats=3):

# repeat the experiment and gather results

results = [task() for _ in range(repeats)]

# return the average of the results

return sum(results) / repeats

# run the experiment and report the result

duration = experiment()

print(f'Took {duration:.3f} seconds')

Running the example creates a matrix, a vector, then multiplies them together.

This task is repeated three times and the average time is reported in seconds.

In this case, the example took about 7.142 seconds on my system.

It may take more or fewer seconds to complete on your system depending on the speed of your hardware.

1	Took 7.142 seconds

Next, let’s explore the same experiment with the ‘@‘ operator.

Matrix-Vector Multiplication with @

We can re-run the same single-threaded matrix-vector multiplication experiment using the ‘@‘ operator.

This can be achieved by changing the call to the dot() method to instead use the ‘@‘ operator.

For example:

...

# perform the multiplication

result = matrix @ vector

We expect this to be functionally equivalent to the dot() method and to take about the same time.

The complete example with this change is listed below.

# single-threaded matrix-vector multiplication with @

from os import environ

environ['OMP_NUM_THREADS'] = '1'

from time import time

from numpy import ones

# multiply a matrix by a vector

def task(n=50000):

# record the start time

start = time()

# create a new matrix and fill with 1s

matrix = ones((n,n))

# create the new vector filled with 1s

vector = ones((n,1))

# perform the multiplication

result = matrix @ vector

# calculate and report duration

duration = time() - start

# return duration

return duration

# experiment that averages duration of task function

def experiment(repeats=3):

# repeat the experiment and gather results

results = [task() for _ in range(repeats)]

# return the average of the results

return sum(results) / repeats

# run the experiment and report the result

duration = experiment()

print(f'Took {duration:.3f} seconds')

Running the example took about 7.001 seconds on my system.

This is approximately equivalent to the dot() method which took about 7.142 seconds.

1	Took 7.001 seconds

Next, let’s explore the same experiment using the numpy.matmul() method.

Matrix-Vector Multiplication with numpy.matmul()

We can re-run the same single-threaded matrix-vector multiplication experiment using the numpy.matmul() function.

This can be achieved by changing the call to the dot() method to instead use matmul().

For example:

...

# perform the multiplication

result = matmul(matrix, vector)

We expect this to be functionally equivalent to the dot() method and the ‘@‘ operator and to take about the same time.

The complete example with this change is listed below.

# single-threaded matrix-vector multiplication with matmul()

from os import environ

environ['OMP_NUM_THREADS'] = '1'

from time import time

from numpy import ones

from numpy import matmul

# multiply a matrix by a vector

def task(n=50000):

# record the start time

start = time()

# create a new matrix and fill with 1s

matrix = ones((n,n))

# create the new vector filled with 1s

vector = ones((n,1))

# perform the multiplication

result = matmul(matrix, vector)

# calculate and report duration

duration = time() - start

# return duration

return duration

# experiment that averages duration of task function

def experiment(repeats=3):

# repeat the experiment and gather results

results = [task() for _ in range(repeats)]

# return the average of the results

return sum(results) / repeats

# run the experiment and report the result

duration = experiment()

print(f'Took {duration:.3f} seconds')

Running the example took about 7.028 seconds on my system.

This is approximately equivalent to the dot() method which took about 7.142 seconds and the ‘@’ operator which took about 7.001 seconds.

1	Took 7.028 seconds

Next, let’s explore multithreaded versions of the matrix-vector multiplication.

Free Concurrent NumPy Course

Get FREE access to my 7-day email course on concurrent NumPy.

Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.

Learn more

Example of Multithreaded Matrix-Vector Multiplication

We can explore multithreaded matrix-vector multiplication.

The previous section showed that all three common approaches to matrix-vector multiplication used in Numpy have similar performance characteristics. We can choose one in this case, such as the dot() method, and benchmark its performance using multiple threads.

The dot() method in numpy is multithreaded via the installed BLAS library.

We can configure the number of threads used by the BLAS library via the appropriate environment variable.

In this case, we will set the number of threads to be equal to the number of physical CPU cores in the system.

We can achieve better performance with CPU-bound operations in some cases by setting the number of threads to be equal to the number of physical CPU cores.

I have 4 physical CPU cores in my system, so this example will use 4 BLAS threads. If you have more or fewer physical CPU cores, adjust the configuration accordingly.

...

from os import environ

environ['OMP_NUM_THREADS'] = '4'

Tying this together, the complete example is listed below.

# multithreaded matrix-vector multiplication

from os import environ

environ['OMP_NUM_THREADS'] = '4'

from time import time

from numpy import ones

# multiply a matrix by a vector

def task(n=50000):

# record the start time

start = time()

# create a new matrix and fill with 1s

matrix = ones((n,n))

# create the new vector filled with 1s

vector = ones((n,1))

# perform the multiplication

result = matrix.dot(vector)

# calculate and report duration

duration = time() - start

# return duration

return duration

# experiment that averages duration of task function

def experiment(repeats=3):

# repeat the experiment and gather results

results = [task() for _ in range(repeats)]

# return the average of the results

return sum(results) / repeats

# run the experiment and report the result

duration = experiment()

print(f'Took {duration:.3f} seconds')

Running the example multiplies the matrix by the vector using multiple threads and reports the duration averaged over three repeats.

In this case, the example takes about 6.646 seconds to complete.

This is about 0.496 seconds (or 496 milliseconds) faster than the single-threaded version, or a speed-up of about 1.07x.

This is not a massive speed-up, perhaps given that the operation itself is challenging to run with multiple threads.

1	Took 6.646 seconds

Can you achieve better performance?

For example, you can repeat the experiment with different configurations, such as:

BLAS configured with one less the number of physical CPU cores (e.g. 3).
BLAS configured with double the number of physical CPU cores (e.g. 8).
BLAS configured with an arbitrary number of threads (e.g. 5).

If you try any of these configurations, let me know how you go.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Takeaways

You now know how to perform matrix-vector multiplication in parallel using threads in numpy.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by wu yi on Unsplash

NumPy Parallel Matrix-Vector Multiplication

Need Parallel Matrix-Vector Multiplication

Matrix-Vector Multiplication is Parallel in Numpy

Example of Single-Threaded Matrix-Vector Multiplication

Matrix-Vector Multiplication with dot()

Matrix-Vector Multiplication with @

Matrix-Vector Multiplication with numpy.matmul()

Example of Multithreaded Matrix-Vector Multiplication

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Concurrent NumPy Fast
(without the frustration)

Additional menu

Need Parallel Matrix-Vector Multiplication

Matrix-Vector Multiplication is Parallel in Numpy

Example of Single-Threaded Matrix-Vector Multiplication

Matrix-Vector Multiplication with dot()

Matrix-Vector Multiplication with @

Matrix-Vector Multiplication with numpy.matmul()

Example of Multithreaded Matrix-Vector Multiplication

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Concurrent NumPy Fast (without the frustration)

Learn Concurrent NumPy Fast
(without the frustration)