Numpy Parallel Random Numbers with Multiprocessing (up to 28x slower)

Last Updated on September 29, 2023

You can create and populate a NumPy vector of random numbers in parallel using Python multiprocessing.

Although possible, this is not recommended.

Using multiprocessing to generate random numbers in parallel can offer a SLOW DOWN from 8.43x to 28.37x compared to the single-process version, depending on the approach chosen.

This is a good exercise to learn the limitations of multiprocessing and how to not combine it with NumPy.

In this tutorial, you will discover how to create a numpy vector of random numbers in parallel using multiprocessing.

Let’s get started.

Table of Contents

Need a Large Array of Random Numbers in Parallel

A common problem is the need to create a large numpy array of random numbers.

Generating a random number is a relatively slow operation. This slowness is compounded if we need to generate a large vector of numbers, such as 100 million or 1 billion items.

Given that we have multiple CPU cores in our system, we expect that we can speed up the operation by executing it in parallel.

How can we create a large numpy vector of random numbers in parallel using all CPU cores?

Run loops using all CPUs, download your FREE book to learn how.

How to Generate Random Numbers in Parallel with Multiprocessing

Numpy does not provide an API for creating an array of numpy vectors in parallel.

Instead, we must develop a solution ourselves.

One approach is to use process-based concurrency with the multiprocessing module.

The multiprocessing module is provided in the Python standard library and offers parallelism via process-based concurrency. This is unlike Python threads that are limited to running one at a time due to thread-safety issues with the Python interpreter.

The multiprocessing.Pool class provides a pool of worker processes that can be created once and reused to execute multiple tasks.

For example, we create a Pool and specify the number of workers to create, one for each CPU core in our system.

...

# create the process pool

with Pool(8) as pool:

# ...

Once created, we can call the map() or starmap() methods to call a function with one or multiple arguments.

If you are new to the multiprocessing Pool class, you can learn more about it here:

Python Multiprocessing Pool: The Complete Guide

There are two approaches we could explore for creating large arrays of random numbers in parallel, they are:

Create the arrays in parallel.
Populate the arrays in parallel.

Let’s take a closer look at each approach.

Create NumPy Arrays of Random Values in Parallel

One approach to creating a large vector of random numbers is to create multiple small vectors of random numbers in parallel, then combine them together into one large vector.

This can be achieved by defining a function that creates a random number generator with a unique seed via the numpy.random.default_rng() function, then calling the random() method to create an array of a given size.

For example:

# create and return a vector of random numbers

def populate(seed, size):

# create random number generator with the seed

rand = default_rng(seed)

# create vector of random floats

return rand.random(size=size)

We can then call this function once for each CPU core in our system. It requires that we know how many CPU cores we have, then determine how large each subarray needs to be.

For example, if we required a vector of 1,000,000,000 (one billion) numbers and we had 8 CPU cores available, then each sub-array would be 1,000,000,000 / 8 or 125,000,000 items in length.

We could issue these tasks to the process pool via the starmap() method that allows the target function to take more than one argument.

For example:

...

# create the arguments

args = [(i+1, 125000000) for i in range(8)]

# create sub arrays

result_list = pool.starmap(populate, args)

We can then combine the list of arrays together into a single vector via the numpy.concatenate() function.

For example:

...

# convert arrays into one large array

result = concatenate(result_list)

Populate NumPy Array With Random Values in Parallel

Another approach is to create a very large numpy array, then populate portions of it with different workers.

We can create a large empty array via the numpy.empty() function.

For example:

...

# create array

array = empty(1000000000)

Next, we can define a function that will populate a portion of a provided array with random numbers.

Again, we can create a random number generator via the numpy.random.default_rng() function with a given random seed, then call the random() method and specify the portion of the array to populate.

# populate a subsequence of a large array

def populate(seed, vector, ix_start, ix_end):

# create random number generator with the seed

rand = default_rng(seed)

# populate a subsequence of the large vector

rand.random(out=vector[ix_start:ix_end])

We can partition the indexes of our large array based on the number of workers in the pool.

This requires first calculating the size of each partition, such as 125,000,000 items if we had a vector of 1 million items and 8 workers.

...

# determine the size of each subsequence

size = 125000000

We can then determine the beginning and end indexes of the array for each worker to populate.

These can be prepared as arguments to be passed to the starmap() method on the Pool.

For example:

...

# prepare arguments for each call to populate()

args = [(i, array, i*size, (i+1)*size) for i in range(8)]

# populate each subsequence

result_list = pool.starmap(populate, args)

Note on Seeds of Random Number Generators

Each worker requires its own random number generator.

It is important that each random number generator uses a different seed so that the sequence of generated numbers does not overlap with any other subsequence.

This can be achieved by first creating a generator for random number seeds via the numpy.random.SeedSequence class.

For example:

...

# create generator for random number generator seeds

seed_seq = SeedSequence(1)

This can then be used to generate a sequence of unique seeds to pass to each child worker to seed their random number generator.

This can be achieved by calling the spawn() method and specifying the number of seeds to generate.

For example:

...

# create seeds to use for random number generators

seeds = seed_seq.spawn(8)

The seed can then be passed to the random number generator when it is created.

For example:

...

# create random number generator with the seed

rand = default_rng(seed)

The random number generator can then be used to create random numbers with the required distribution, such as a uniform distribution.

Now that we know how to generate a large vector of random numbers in parallel with multiprocessing, let’s look at some worked examples.

Start Now: Free Concurrent NumPy Crash Course

Create a Large Vector of Random Numbers (sequential)

Firstly, we can explore how we might create one large vector of random numbers sequentially.

We can call the numpy.random.rand() function to create a vector of random numbers with a given size.

...

# size of the vector

n = 1000000000

# create the array

array = rand(n)

We will time how long this takes to run as a point of comparison.

The complete example is listed below.

# create a large vector of random numbers

from time import time

from numpy.random import rand

# record start time

start = time()

# size of the vector

n = 1000000000

# create the array

array = rand(n)

# calculate and report duration

duration = time() - start

print(f'Took {duration:.3f} seconds')

Running the example takes about 7.745 seconds on my system.

It may take more or fewer seconds, depending on the speed of your hardware, Python version, and numpy version.

1	Took 7.745 seconds

Next, let’s look at how we can create and then populate a large array with random numbers.

Free Concurrent NumPy Course

Get FREE access to my 7-day email course on concurrent NumPy.

Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.

Learn more

Populate a Large Vector of Random Numbers (sequential)

An alternative approach to creating a large vector of random numbers is to populate it, with a single process.

That is, we can create the vector first, then populate it with random numbers.

...

# size of the vector

n = 1000000000

# create the array

array = empty(n)

# create random number generator with the seed

rand = default_rng(seed=1)

# populate the array

rand.random(out=array)

The complete example is listed below.

# populate a large vector with random numbers

from time import time

from numpy.random import default_rng

# from numpy.random import random

from numpy import empty

# record start time

start = time()

# size of the vector

n = 1000000000

# create the array

array = empty(n)

# create random number generator with the seed

rand = default_rng(seed=1)

# populate the array

rand.random(out=array)

# calculate and report duration

duration = time() - start

print(f'Took {duration:.3f} seconds')

Running the example is faster than having the numpy.random.rand() function create it for us.

On my system, this example was completed in about 4.907 seconds.

This is surprising.

This is a difference of about 2.838 seconds or about 1.58x faster.

1	Took 4.907 seconds

Next, let’s look at creating a vector of random numbers in parallel using multiprocessing.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Create Random Numbers Vectors in Parallel with Processes

We can create subvectors of random numbers in parallel using processes.

These vectors can then be combined into one large vector.

Firstly, we must define a function that will take a random number seed and a size of a vector to create, then returns a vector of random numbers of the given size.

# create and return a vector of random numbers

def populate(seed, size):

# create random number generator with the seed

rand = default_rng(seed)

# create vector of random floats

return rand.random(size=size)

We can then create the process pool.

In this case, we will use 4 worker processes, one for each physical CPU core in my system. Update to match the number of scores in your system or experiment to find a configuration that works best for your system.

...

# create a pool of workers

n_workers = 4

with Pool(n_workers) as pool:

# ...

We can then prepare the seeds for the random number generators used in each child worker.

...

# create seeds for child processes

seed_seq = SeedSequence(1)

seeds = seed_seq.spawn(n_workers)

We can then automatically determine the size of each sub-array, based on the overall vector size and the number of workers we have available.

...

# determine the size of each sub-array

size = int(ceil(n / n_workers))

Next, we can prepare the arguments for each task and issue them to the process pool.

...

# create arguments

args = [(seed, size) for seed in seeds]

# create sub arrays

result_list = pool.starmap(populate, args)

Finally, we can concatenate the subarrays into one large vector.

...

# convert arrays into one large array

result = concatenate(result_list)

Tying this together, the complete example is listed below.

# example of creating vectors of random values in parallel

from time import time

from numpy import concatenate

from numpy import ceil

from numpy.random import SeedSequence

from numpy.random import default_rng

from multiprocessing.pool import Pool

# create and return a vector of random numbers

def populate(seed, size):

# create random number generator with the seed

rand = default_rng(seed)

# create vector of random floats

return rand.random(size=size)

# protect the entry point

if __name__ == '__main__':

# record start time

start = time()

# size of the vector

n = 1000000000

# create a pool of workers

n_workers = 4

with Pool(n_workers) as pool:

# create seeds for child processes

seed_seq = SeedSequence(1)

seeds = seed_seq.spawn(n_workers)

# determine the size of each sub array

size = int(ceil(n / n_workers))

# create arguments

args = [(seed, size) for seed in seeds]

# create sub arrays

result_list = pool.starmap(populate, args)

# convert arrays into one large array

result = concatenate(result_list)

# calculate and report duration

duration = time() - start

print(f'Took {duration:.3f} seconds')

Running the example took about 65.267 seconds on my system. That’s just over one minute.

That is about 57.522 seconds SLOWER than the sequential (non-parallel) version of the code, or 8.43x slower.

Why is it slower?

The reason is that each array created in a worker process must be transmitted back to the main process. This requires that the array be serialized (pickled) and sent via inter-process communication, then unpickled.

This adds a huge overhead.

1	Took 65.267 seconds

Next, let’s look at how to develop a multiprocessing version of populating a large vector.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Populate Vectors with Random Numbers in Parallel

We can use multiprocessing to populate a large numpy vector with random numbers in parallel using worker processes.

Firstly, we must define a task function used to populate a portion of the large vector.

The function takes the seed for the random number generator, the vector itself, and the start and end indexes. It creates the random number generator and then specifies the subarray to populate.

# populate a subsequence of a large array

def populate(seed, vector, ix_start, ix_end):

# create random number generator with the seed

rand = default_rng(seed)

# populate a subsequence of the large vector

rand.random(out=vector[ix_start:ix_end])

Next, we can create the process pool with one worker per physical CPU core in our system. Update to match the number of CPUs in your system.

...

# create the pool of workers

n_workers = 4

with Pool(n_workers) as pool:

# ...

We can then create the sequence of seeds for the random number generators.

...

# create seeds for child processes

seed_seq = SeedSequence(1)

seeds = seed_seq.spawn(n_workers)

We can also automatically determine the size of each subsequence based on the size of the array and the number of workers we have available.

...

# determine the size of each subsequence

size = int(ceil(n / n_workers))

Finally, we can prepare the arguments for each task and issue the tasks to the process pool to be executed in parallel.

...

# prepare arguments for each call to populate()

args = [(seeds[i], array, i*size, (i+1)*size) for i in range(n_workers)]

# populate each subsequence

result_list = pool.starmap(populate, args)

Tying this together, the complete example is listed below.

# example of populating a large vector in parallel using processes

from time import time

from numpy import concatenate

from numpy import ceil

from numpy import empty

from numpy.random import SeedSequence

from numpy.random import default_rng

from multiprocessing.pool import Pool

# populate a subsequence of a large array

def populate(seed, vector, ix_start, ix_end):

# create random number generator with the seed

rand = default_rng(seed)

# populate a subsequence of the large vector

rand.random(out=vector[ix_start:ix_end])

if __name__ == '__main__':

# record start time

start = time()

# size of the vector

n = 1000000000

# create array

array = empty(n)

# create the pool of workers

n_workers = 4

with Pool(n_workers) as pool:

# create seeds for child processes

seed_seq = SeedSequence(1)

seeds = seed_seq.spawn(n_workers)

# determine the size of each subsequence

size = int(ceil(n / n_workers))

# prepare arguments for each call to populate()

args = [(seeds[i], array, i*size, (i+1)*size) for i in range(n_workers)]

# populate each subsequence

result_list = pool.starmap(populate, args)

# calculate and report duration

duration = time() - start

print(f'Took {duration:.3f} seconds')

Running this example took about 139.228 seconds to complete on my system. That is about 2.3 minutes.

That is about 134.321 seconds SLOWER than the sequential (single-process) version of the code for populating the array or about 28.37x slower.

Again, we can reason why.

Sending one large array to worker processes to be populated, and sending it back again, is very slow because of inter-process communication.

1	Took 139.228 seconds

Results and Recommendations

The results clearly show that using multiprocessing to create arrays of random numbers is slower than the single-process version.

The main reason is that processes do not have shared memory. Instead, data must be sent from the main process to the child process, and back again.

The table below summarizes the results.

Approach | Time (sec)

----------------------------------------

Create Array (sequential) | 7.745

Populate Array (sequential) | 4.907

Create Array (parallel) | 65.267

Populate Array (parallel) | 139.228

We saw that using multiprocessing can result in up to 28x worse performance.

Do not use multiprocessing to create random numbers in parallel.

There may be a workaround.

Python does offer mechanisms to share memory between processes. Perhaps one of these mechanisms will help to improve performance back to and even beyond the single process versions.

Examples of techniques to explore include:

Using shared ctypes such as multiprocessing.Array.
Using a hosted object via a multiprocessing.Manager.
Using shared memory via multiprocessing.shared_memory.

Alternatively, we can achieve a speed-up by creating random numbers using threads. This is because numpy releases the global interpreter lock (GIL) when calling C-code, such as generating random numbers.

You can learn more about NumPy releasing the GIL in the tutorial:

NumPy vs the Global Interpreter Lock (GIL)

Generating random numbers in parallel using multiprocessing is not recommended. However, generating random numbers in parallel using threads is the recommended approach.

Takeaways

You now know how to create a numpy vector of random numbers in parallel using multiprocessing.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by derek braithwaite on Unsplash

Numpy Parallel Random Numbers with Multiprocessing (up to 28x slower)

Need a Large Array of Random Numbers in Parallel

How to Generate Random Numbers in Parallel with Multiprocessing

Create NumPy Arrays of Random Values in Parallel

Populate NumPy Array With Random Values in Parallel

Note on Seeds of Random Number Generators

Create a Large Vector of Random Numbers (sequential)

Populate a Large Vector of Random Numbers (sequential)

Create Random Numbers Vectors in Parallel with Processes

Populate Vectors with Random Numbers in Parallel

Results and Recommendations

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Concurrent NumPy Fast
(without the frustration)

Additional menu

Need a Large Array of Random Numbers in Parallel

How to Generate Random Numbers in Parallel with Multiprocessing

Create NumPy Arrays of Random Values in Parallel

Populate NumPy Array With Random Values in Parallel

Note on Seeds of Random Number Generators

Create a Large Vector of Random Numbers (sequential)

Populate a Large Vector of Random Numbers (sequential)

Create Random Numbers Vectors in Parallel with Processes

Populate Vectors with Random Numbers in Parallel

Results and Recommendations

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Concurrent NumPy Fast (without the frustration)

Learn Concurrent NumPy Fast
(without the frustration)