Last Updated on September 29, 2023
You can create and populate a vector of random numbers in parallel using Python threads.
this can offer a speed-up from 1.81x to 4.05x compared to the single-threaded version, depending on the approach chosen.
In this tutorial, you will discover how to create a numpy vector of random numbers in parallel using threads.
Let’s get started.
Need a Large Array of Random Numbers in Parallel
A common problem is the need to create a large numpy array of random numbers.
Generating a pseudo-random number is a relatively slow operation because it must calculate a mathematical function. This slowness is compounded if we need to generate a large vector of numbers, such as 100 million or one billion items.
Given that we have multiple CPU cores in our system, we expect that we can speed up the operation by executing it in parallel.
How can we create a large numpy vector of random numbers in parallel using all CPU cores?
Run loops using all CPUs, download your FREE book to learn how.
Create Large NumPy Arrays in Parallel Using Threads
Numpy does not provide an API for creating an array of numpy vectors in parallel.
Instead, we must develop a solution ourselves using threads.
The Python interpreter is not thread-safe. This means Python will only allow one Python thread to run at a time. In turn, threads cannot be used for parallelism in Python, in most cases.
One of the exceptions is when Python calls a C-library and explicitly releases the global interpreter lock (GIL), allowing other threads to run.
Most numpy functions are implemented in C and will release this lock, including functions for generating random numbers in the numpy.random module.
This means that we can use threads to generate random numbers in parallel.
The exceptions are few but important: while a thread is waiting for IO (for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL.
— Parallel Programming with numpy and scipy
You can learn more about NumPy releasing the GIL in the tutorial:
Python provides the multiprocessing.pool.ThreadPool class that we can use to create a pool of worker threads to execute arbitrary tasks.
For example, we can create a ThreadPool and specify the number of workers to create, one for each CPU core in our system.
1 2 3 4 |
... # create the thread pool with ThreadPool(8) as pool: Â Â Â Â # ... |
Once created, we can call the map() or starmap() methods to issues tasks to workers in the thread pool.
If you are new to the ThreadPool class, you can learn more in the guide:
Note, we prefer threads over process-based concurrency via the multiprocessing module because threads share memory directly. Processes must transmit data via inter-process communication which is significantly slower.
There are two approaches we could explore to creating large arrays of random numbers in parallel, they are:
- Create the arrays in parallel.
- Populate the arrays in parallel.
Let’s take a closer look at each approach.
Create NumPy Arrays of Random Values in Parallel
One approach to creating a large vector of random numbers is to create multiple small vectors of random numbers in parallel, then combine them together into one large vector.
This can be achieved by defining a function that creates a random number generator with a unique seed via the numpy.random.default_rng() function, then calling the random() method to create an array of a given size.
For example:
1 2 3 4 5 6 |
# create and return a vector of random numbers define populate(seed, size):     # create random number generator with the seed     rand = default_rng(seed)     # create vector of random floats     return rand.random(size=size) |
We can then call this function once for each CPU core in our system. It requires that we know how many CPU cores we have, then determine how large each subarray needs to be.
For example, if we required a vector of 1,000,000,000 (one billion) numbers and we had 8 CPU cores available, then each sub-array would be 1,000,000,000 / 8 or 125,000,000 items in length.
We could issue these tasks to the thread pool via the starmap() method that allows the target function to take more than one argument.
For example:
1 2 3 4 5 |
... # create the arguments args = [(i+1, 125000000) for i in range(8)] # create sub arrays result_list = pool.starmap(populate, args) |
We can then combine the list of arrays together into a single vector via the numpy.concatenate() function.
For example:
1 2 3 |
... # convert arrays into one large array result = concatenate(result_list) |
Populate NumPy Array With Random Values in Parallel
Another approach is to create a very large numpy array, then populate portions of it with different threads.
We can create a large empty array via the numpy.empty() function.
For example:
1 2 3 |
... # create array array = empty(1000000000) |
Next, we can define a function that will populate a portion of a provided array with random numbers.
Again, we can create a random number generator via the numpy.random.default_rng() function with a given random seed, then call the random() method and specify the portion of the array to populate.
1 2 3 4 5 6 |
# populate a subsequence of a large array define populate(seed, vector, ix_start, ix_end):     # create random number generator with the seed     rand = default_rng(seed)     # populate a subsequence of the large vector     rand.random(out=vector[ix_start:ix_end]) |
We can then partition the indexes of our large array based on the number of thread workers in the pool.
This requires first calculating the size of each partition, such as 125,000,000 items if we had a vector of 1 million items and 8 thread workers
1 2 3 |
... # determine the size of each subsequence size = 125000000) |
We can then determine the beginning and end indexes of the array for each thread worker to populate.
These can be prepared as arguments to be passed to the starmap() method on the ThreadPool.
For example:
1 2 3 4 5 |
... # prepare arguments for each call to populate() args = [(i, array, i*size, (i+1)*size) for i in range(8)] # populate each subsequence result_list = pool.starmap(populate, args) |
Note on Seeds of Random Number Generators
Each thread worker requires its own random number generator.
It is important that each random number generator uses a different seed so that the sequence of generated numbers does not overlap with any other subsequence.
This can be achieved by first creating a generator for random number seeds via the numpy.random.SeedSequence class.
For example:
1 2 3 |
... # create a generator for random number generator seeds seed_seq = SeedSequence(1) |
This can then be used to generate a sequence of unique seeds to pass to each child worker to seed their random number generator.
This can be achieved by calling the spawn() method and specifying the number of seeds to generate.
For example:
1 2 3 |
... # create seeds to use for random number generators seeds = seed_seq.spawn(8) |
The seed can then be passed to the random number generator when it is created.
For example:
1 2 3 |
... # create random number generator with the seed rand = default_rng(seed) |
The random number generator can then be used to create random numbers with the required distribution, such as a uniform distribution.
Now that we know how to generate a large vector of random numbers in parallel with threads, let’s look at some worked examples.
Example of Creating a Large Vector of Random Numbers (sequential)
Firstly, we can explore how we might create one large vector of random numbers without threads (sequentially).
We can call the numpy.random.rand() function to create a vector of random numbers with a given size.
1 2 3 4 5 |
... # size of the vector n = 1000000000 # create the array array = rand(n) |
We will time how long this takes to run as a point of comparison.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 |
# create a large vector of random numbers from time import time from numpy.random import rand # record start time start = time() # size of the vector n = 1000000000 # create the array array = rand(n) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |
Running the example takes about 7.815 seconds on my system.
It may take more or fewer seconds, depending on the speed of your hardware, Python version and numpy version.
1 |
Took 7.815 seconds |
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
Example of Populating a Large Vector of Random Numbers (sequential)
An alternative approach to creating a large vector of random numbers is to populate it, with a single thread.
That is, we can create the vector first, then populate it with random numbers.
1 2 3 4 5 6 7 8 9 |
... # size of the vector n = 1000000000 # create the array array = empty(n) # create random number generator with the seed rand = default_rng(seed=1) # populate the array rand.random(out=array) |
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# populate a large vector with random numbers from time import time from numpy.random import default_rng # from numpy.random import random from numpy import empty # record start time start = time() # size of the vector n = 1000000000 # create the array array = empty(n) # create random number generator with the seed rand = default_rng(seed=1) # populate the array rand.random(out=array) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |
Running the example is faster than having the numpy.random.rand() function create it for us.
On my system, this example was completed in about 4.835 seconds.
This is surprising.
This is a difference of about 2.980 seconds or about 1.62x faster.
1 |
Took 4.835 seconds |
Next, let’s look at creating a vector of random numbers in parallel using threads.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Creating Random Number Vectors in Parallel
We can create subvectors of random numbers in parallel using threads.
These vectors can then be combined into one large vector.
Firstly, we must define a function that will take a random number seed and a size of a vector to create, then returns a vector of random numbers of the given size.
1 2 3 4 5 6 |
# create and return a vector of random numbers def populate(seed, size):     # create random number generator with the seed     rand = default_rng(seed)     # create vector of random floats     return rand.random(size=size) |
We can then create the thread pool.
In this case, we will use 8 worker threads, one for each logical CPU core in my system. Update to match the number of scores in your system or experiment to find a configuration that works best for your system.
1 2 3 4 5 |
... # create a pool of workers n_workers = 8 with ThreadPool(n_workers) as pool: # ... |
We can then prepare the seeds for the random number generators used in each child worker.
1 2 3 4 |
... # create seeds for child processes seed_seq = SeedSequence(1) seeds = seed_seq.spawn(n_workers) |
We can then automatically determine the size of each sub-array, based on the overall vector size and the number of workers we have available.
1 2 3 |
... # determine the size of each sub-array size = int(ceil(n / n_workers)) |
Next, we can prepare the arguments for each task and issue them to the thread pool.
1 2 3 4 5 |
... # create arguments args = [(seed, size) for seed in seeds] # create sub arrays result_list = pool.starmap(populate, args) |
Finally, we can concatenate the subarrays into one large vector.
1 2 3 |
... # convert arrays into one large array result = concatenate(result_list) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# example of creating vectors of random values in parallel from time import time from numpy import concatenate from numpy import ceil from numpy.random import SeedSequence from numpy.random import default_rng from multiprocessing.pool import ThreadPool  # create and return a vector of random numbers def populate(seed, size):     # create random number generator with the seed     rand = default_rng(seed)     # create vector of random floats     return rand.random(size=size)  # record start time start = time() # size of the vector n = 1000000000 # create a pool of workers n_workers = 8 with ThreadPool(n_workers) as pool:     # create seeds for child processes     seed_seq = SeedSequence(1)     seeds = seed_seq.spawn(n_workers)     # determine the size of each sub-array     size = int(ceil(n / n_workers))     # create arguments     args = [(seed, size) for seed in seeds]     # create sub arrays     result_list = pool.starmap(populate, args)     # convert arrays into one large array     result = concatenate(result_list) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |
Running the example took about 4.329 seconds on my system.
That is about 3.486 seconds faster than the sequential (non-parallel) version of the code or 1.81x faster.
It is also 0.506 seconds faster than the single-threaded populate example.
1 |
Took 4.329 seconds |
Next, let’s look at how to develop a multi-threaded version of populating a large vector with random numbers.
Example of Populating Vectors with Random Numbers in Parallel
We can use threads to populate a large numpy vector with random numbers in parallel using threads.
Firstly, we must define a task function used to populate a portion of the large vector.
The function takes the seed for the random number generator, the vector itself, and the start and end indexes. It creates the random number generator and then specifies the subarray to populate.
1 2 3 4 5 6 |
# populate a subsequence of a large array def populate(seed, vector, ix_start, ix_end):     # create random number generator with the seed     rand = default_rng(seed)     # populate a subsequence of the large vector     rand.random(out=vector[ix_start:ix_end]) |
Next, we can create the thread pool with one worker per logical CPU core in our system. Update to match the number of CPUs in your system.
1 2 3 4 5 |
... # create the pool of workers n_workers = 8 with ThreadPool(n_workers) as pool: # ... |
We can then create the sequence of seeds for the random number generators.
1 2 3 4 |
... # create seeds for child processes seed_seq = SeedSequence(1) seeds = seed_seq.spawn(n_workers) |
We can also automatically determine the size of each subsequence based on the size of the array and the number of thread workers we have available.
1 2 3 |
... # determine the size of each subsequence size = int(ceil(n / n_workers)) |
Finally, we can prepare the arguments for each task and issue the tasks to the thread pool to be executed in parallel.
1 2 3 4 5 |
... # prepare arguments for each call to populate() args = [(seeds[i], array, i*size, (i+1)*size) for i in range(n_workers)] # populate each subsequence result_list = pool.starmap(populate, args) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# example of populating a large vector in parallel using threads from time import time from numpy import concatenate from numpy import ceil from numpy import empty from numpy.random import SeedSequence from numpy.random import default_rng from multiprocessing.pool import ThreadPool  # populate a subsequence of a large array def populate(seed, vector, ix_start, ix_end):     # create random number generator with the seed     rand = default_rng(seed)     # populate a subsequence of the large vector     rand.random(out=vector[ix_start:ix_end])  # record start time start = time() # size of the vector n = 1000000000 # create array array = empty(n) # create the pool of workers n_workers = 8 with ThreadPool(n_workers) as pool:     # create seeds for child processes     seed_seq = SeedSequence(1)     seeds = seed_seq.spawn(n_workers)     # determine the size of each subsequence     size = int(ceil(n / n_workers))     # prepare arguments for each call to populate()     args = [(seeds[i], array, i*size, (i+1)*size) for i in range(n_workers)]     # populate each subsequence     result_list = pool.starmap(populate, args) # calculate and report duration duration = time() - start print(f'Took {duration:.3f} seconds') |
Running this example took about 1.068 seconds to complete on my system.
That is about 3.261 seconds faster than the sequential (single-threaded) version of the code for populating the array or about 4.05x faster.
1 |
Took 1.068 seconds |
Review of Results
We can take a moment to review the results.
The table below summarizes the time taken to run each example on my system. Your results may vary.
1 2 3 4 5 6 |
Approach                  | Time (sec) -------------------------------------- Create Array (sequential)  | 7.815 Populate Array (sequential) | 4.835 Create Array (parallel)    | 4.329 Populate Array (parallel)  | 1.068 |
Firstly, the results show that creating a vector and then having it populated with random values is faster than having the numpy.random module create the vector for us.
This is surprising to me, but good to know.
Next, we can clearly see that using threads to create or populate an array in parallel offers a lift in performance.
In the case of creating arrays, we saw a 1.81x lift in performance, whereas when populating arrays we saw an even greater 4.05x lift in performance.
If you’re able to choose your implementation, create the array and have it populated in parallel. It is by far the fastest approach.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy APIÂ
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Takeaways
You now know how to create a numpy vector of random numbers in parallel using threads.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by topcools tee on Unsplash
Do you have any questions?