Last Updated on September 29, 2023
You can share numpy arrays between processes using function arguments.
Numpy arrays can be returned from processes using simulated return values via pipe and queue data structures.
In this tutorial, you will discover how to share numpy arrays between processes using function arguments and simulated return values.
Let’s get started.
Need to Share Numpy Array Between Processes
Python offers process-based concurrency via the multiprocessing module.
Process-based concurrency is appropriate for those tasks that are CPU-bound, as opposed to thread-based concurrency in Python which is generally suited to IO-bound tasks given the presence of the Global Interpreter Lock (GIL).
You can learn more about process-based concurrency and the multiprocessing module in the tutorial:
Consider the situation where we need to share numpy arrays between processes.
This may be for many reasons, such as:
- Data is loaded as an array in one process and analyzed differently in different subprocesses.
- Many child processes load small data as arrays that are sent to a parent process for handling.
- Data arrays are loaded in the parent process and processed in a suite of child processes.
Sharing Python objects and data between processes is slow.
This is because any data, like numpy arrays, shared between processes must be transmitted using inter-process communication (ICP) requiring the data first be pickled by the sender and then unpickled by the receiver.
You can learn more about this in the tutorial:
This means that if we share numpy arrays between processes, it assumes that we receive some benefit, such as a speedup, that overcomes the slow speed of data transmission.
For example, it may be the case that the arrays are relatively small and fast to transmit, whereas the computation performed on each array is slow and can benefit from being performed in separate processes.
Alternatively, preparing the array may be computationally expensive and benefit from being performed in a separate process, and once prepared, the arrays are small and fast to transmit to another process that requires them.
Given these situations, how can we share data between Processes in Python?
Run loops using all CPUs, download your FREE book to learn how.
How to Share Numpy Array Via Function Argument and Return Value
We can share numpy arrays with child processes via function arguments.
Recall that when we start a child process, we can configure it to run a target function via the “target” argument. We can also pass arguments to the target function via the “args” keyword.
As such, we can define a target function that takes a numpy array as an argument and execute it in a child process.
For example:
1 2 3 |
... # create a child process child = Process(target=task, args=(data,)) |
You can learn more about running a function in a child process in the tutorial:
Passing a numpy array to a target function executed in the child process is slow.
It requires that the numpy array be pickled in the parent process then unpickled in the child process. This means it may only be appropriate for modestly sized arrays that are fast to pickle and unpickle.
Nevertheless, it is a simple and effective approach.
What if we need to retrieve a numpy array from a target function executed in a child process?
We cannot return an array directly from the target function.
Instead, we must use a Queue or a Pipe.
A Pipe is a simple data structure for sharing data between processes. We can create a pipe, which will create a send and receive connection.
1 2 3 |
... # create the shared pipe conn1, conn2 = Pipe() |
We can then share the sending connection with the child process and the receiving connection with the parent process and use them to send an array back from the child to the parent process.
1 2 3 |
... # send the array via a pipe pipe.send(data) |
Alternatively, we can use a shared queue. The queue can be created in the parent process, then shared with the child process which can put the array on the queue once it has been prepared.
1 2 3 |
... # create the shared queue queue = SimpleQueue() |
In both cases, using a pipe and queue requires that the array be pickled and unpickled in order to be transmitted between the processes. This is slow, given the added computational overhead.
You can learn more about simulating return values from processes using pipes and queues in the tutorial:
Example Sharing Numpy Array Via Function Argument
We can explore the case of sharing a numpy array with a child process via a function argument.
In this example, we will create an array in the parent process and then pass it to a task executed in a child process. The array will be copied and transmitted to the child process and as such changes to it won’t be reflected in the parent process.
Firstly, we can define a task function to be executed in the child process.
The function will take a numpy array as an argument. It will report on some of the content of the array, change all values to zero, then report that the contents of the array were changed.
The task() function below implements this.
1 2 3 4 5 6 7 8 |
# task executed in a child process def task(data): # check some data in the array print(data[:5,:5]) # change data in the array data.fill(0.0) # confirm the data was changed print(data[:5,:5]) |
Next, in the main process, we can create an array with a modest size, filled with ones.
1 2 3 4 5 |
... # define the size of the numpy array n = 10000 # create the numpy array data = ones((n,n)) |
We can then create and configure a child process to execute our task() function and pass the numpy array as an argument.
1 2 3 |
... # create a child process child = Process(target=task, args=(data,)) |
We can then start the child process and block until it has completed.
1 2 3 4 5 |
... # start the child process child.start() # wait for the child process to complete child.join() |
Finally, we can check the contents of the array.
1 2 3 |
... # check some data in the array print(data[:5,:5]) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# share numpy array via function argument from multiprocessing import Process from numpy import ones # task executed in a child process def task(data): # check some data in the array print(data[:5,:5]) # change data in the array data.fill(0.0) # confirm the data was changed print(data[:5,:5]) # protect the entry point if __name__ == '__main__': # define the size of the numpy array n = 10000 # create the numpy array data = ones((n,n)) # create a child process child = Process(target=task, args=(data,)) # start the child process child.start() # wait for the child process to complete child.join() # check some data in the array print(data[:5,:5]) |
Running the example first creates the numpy array populated with one values.
The child process is then created, configured, and started. The main process then blocks until the child process is terminated.
The child process runs, confirming the array was populated with one values. It then fills the array with zero values and confirms the array’s contents were changed.
The child process terminates and the main process resumes. It confirms that the contents of the array in the parent process have not changed.
This highlights how we can share a copy of an array with a child process and how changes to the copy of the array in the child process are not reflected in the parent process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] [[0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.]] [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] |
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
Example Sharing Numpy Array Via Pipe (simulated return value)
We can explore the case of simulating the return of a numpy array from a child process using a pipe.
In this example, we will create a pipe and share the sending connection with the child process and keep the receive connection in the parent process. The child process will then create a numpy array and send it to the parent process via the pipe.
Firstly, we can define a task function that takes a pipe connection as an argument and creates and transmits the array.
The function first creates an array initialized with one values. It then confirms the contents of the array contains one values and sends it via the pipe.
The task() function below implements this.
1 2 3 4 5 6 7 8 9 10 |
# task executed in a child process def task(pipe): # define the size of the numpy array n = 10000 # create the numpy array data = ones((n,n)) # check some data in the array print(data[:5,:5]) # send the array via a pipe pipe.send(data) |
Next, the main process will create the pipe.
1 2 3 |
... # create the shared pipe conn1, conn2 = Pipe() |
It then creates the child process and configures it to execute our task function, passing it the sending connection as an argument.
1 2 3 |
... # create a child process child = Process(target=task, args=(conn2,)) |
The child process is then started and the main process blocks waiting to receive the array via the pipe.
1 2 3 4 5 |
... # start the child process child.start() # read the data from the pipe data = conn1.recv() |
Finally, the main process confirms the contents of the received numpy array.
1 2 3 |
... # check some data in the array print(data[:5,:5]) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# share numpy array via pipe from multiprocessing import Process from multiprocessing import Pipe from numpy import ones # task executed in a child process def task(pipe): # define the size of the numpy array n = 10000 # create the numpy array data = ones((n,n)) # check some data in the array print(data[:5,:5]) # send the array via a pipe pipe.send(data) # protect the entry point if __name__ == '__main__': # create the shared pipe conn1, conn2 = Pipe() # create a child process child = Process(target=task, args=(conn2,)) # start the child process child.start() # read the data from the pipe data = conn1.recv() # check some data in the array print(data[:5,:5]) |
Running the example first creates the pipe with the sending and receiving connections.
Next, the child process is created and configured, and started.
The main process then blocks, waiting to read the numpy array from the pipe.
The child process runs, first creating a numpy array initialized with one values.
It then confirms the contents of the array and transmits it to the parent process via the pipe. The child process then terminates.
The main process reads the numpy array from the pipe and confirms it contains one values.
1 2 3 4 5 6 7 8 9 10 |
[[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example Sharing Numpy Array Via Queue (simulated return value)
We can explore the case of simulating the return of a numpy array from a child process using a queue.
In this example, we will update the example from the previous section to use a SimpleQueue instead of a Pipe.
This requires first updating the task() function to take the shared queue as an argument and then putting the created array on the queue via the put() function.
The updated task() function with these changes is listed below.
1 2 3 4 5 6 7 8 9 10 |
# task executed in a child process def task(queue): # define the size of the numpy array n = 1000 # create the numpy array data = ones((n,n)) # check some data in the array print(data[:5,:5]) # send the array via a queue queue.put(data) |
Next, the main process must create the shared queue and pass it as an argument to the child process.
1 2 3 4 5 |
... # create the shared queue queue = SimpleQueue() # create a child process child = Process(target=task, args=(queue,)) |
It then must read the array from the shared queue via the get() method.
1 2 3 |
... # read the data from the queue data = queue.get() |
The complete example with all of these changes is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# share numpy array via queue from multiprocessing import Process from multiprocessing import SimpleQueue from numpy import ones # task executed in a child process def task(queue): # define the size of the numpy array n = 1000 # create the numpy array data = ones((n,n)) # check some data in the array print(data[:5,:5]) # send the array via a queue queue.put(data) # protect the entry point if __name__ == '__main__': # create the shared queue queue = SimpleQueue() # create a child process child = Process(target=task, args=(queue,)) # start the child process child.start() # read the data from the queue data = queue.get() # check some data in the array print(data[:5,:5]) |
Running the example first creates the shared queue.
Next, the child process is created and configured, and started.
The main process then blocks, waiting to read the numpy array from the queue.
The child process runs, first creating a numpy array initialized with one values.
It then confirms the contents of the array and transmits it to the parent process via the queue. The child process then terminates.
The main process reads the numpy array from the queue and confirms it contains one values.
1 2 3 4 5 6 7 8 9 10 |
[[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] |
Example Sharing Numpy Array Via Argument and Queue (simulated return value)
We can explore the case of sharing a numpy array with a child process as an argument and then returning the array to the parent process via a queue.
In this example, we will create an array in the parent process and then pass it to a child process. The child process will then modify the array and transmit it back to the parent process via a shared queue.
This will simulate sharing a numpy array with another process and returning the updated array.
The downside of this approach is that sharing the array from the parent to the child and from the child back to the parent involves making copies of the array that are pickled and unpickled. This adds a lot of computational overhead and may not provide a benefit unless the computation performed by the new process is intensive indeed.
First, we can define the task function to be executed in the child process.
The function takes a numpy array and shared queue as arguments. It then confirms the array’s contents, fills it with zero values, confirms the array’s contents were changed, and puts it on the shared queue.
The task() function below implements this.
1 2 3 4 5 6 7 8 9 10 |
# task executed in a child process def task(data, queue): # check some data in the array print(data[:5,:5]) # change data in the array data.fill(0.0) # confirm the data was changed print(data[:5,:5]) # send the array via a queue queue.put(data) |
Next, the main process will create an array filled with one values and a shared queue.
1 2 3 4 5 6 7 |
... # define the size of the numpy array n = 1000 # create the numpy array data = ones((n,n)) # create the shared queue queue = SimpleQueue() |
The child process is then created and configured to execute the task() function and is passed the numpy array and shared queue.
1 2 3 |
... # create a child process child = Process(target=task, args=(data,queue)) |
The child process is then started and the main process then blocks, waiting for the updated array to be transmitted via the shared queue.
1 2 3 4 5 |
... # start the child process child.start() # read the data from the queue data = queue.get() |
Finally, the contents of the updated array are reported.
1 2 3 |
... # check some data in the array print(data[:5,:5]) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# share numpy array via function argument and queue from multiprocessing import Process from multiprocessing import SimpleQueue from numpy import ones # task executed in a child process def task(data, queue): # check some data in the array print(data[:5,:5]) # change data in the array data.fill(0.0) # confirm the data was changed print(data[:5,:5]) # send the array via a queue queue.put(data) # protect the entry point if __name__ == '__main__': # define the size of the numpy array n = 1000 # create the numpy array data = ones((n,n)) # create the shared queue queue = SimpleQueue() # create a child process child = Process(target=task, args=(data,queue)) # start the child process child.start() # read the data from the queue data = queue.get() # check some data in the array print(data[:5,:5]) |
Running the example first creates a numpy array filled with one values and a shared queue.
The child process is created and configured, then started.
The main process blocks, waiting to read the updated array from the shared queue.
The child process runs, first confirming that the array contains one values. It then fills the array with zero values and puts it on the shared queue before terminating.
The main process results and reads the updated numpy array.
It then confirms that the updated array contains zero values, as set by the child process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]] [[0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.]] [[0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.]] |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Takeaways
You now know how to share numpy arrays between processes using function arguments and simulated return values.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by MARIOLA GROBELSKA on Unsplash
Do you have any questions?