Share Numpy Array Between Processes With Shared ctypes
You can share a numpy array between processes by copying it into a shared ctype array.
In this tutorial, you will discover how to share a numpy array between processes using a ctype array.
Let's get started.
Need to Share Numpy Array Between Processes
Python offers process-based concurrency via the multiprocessing module.
Process-based concurrency is appropriate for those tasks that are CPU-bound, as opposed to thread-based concurrency in Python which is generally suited to IO-bound tasks given the presence of the Global Interpreter Lock (GIL).
You can learn more about process-based concurrency and the multiprocessing module in the tutorial:
Consider the situation where we need to share numpy arrays between processes.
This may be for many reasons, such as:
- Data is loaded as an array in one process and analyzed differently in different subprocesses.
- Many child processes load small data as arrays that are sent to a parent process for handling.
- Data arrays are loaded in the parent process and processed in a suite of child processes.
Sharing Python objects and data between processes is slow.
This is because any data, like numpy arrays, shared between processes must be transmitted using inter-process communication (ICP) requiring the data first be pickled by the sender and then unpickled by the receiver.
You can learn more about this in the tutorial:
This means that if we share numpy arrays between processes, it assumes that we receive some benefit, such as a speedup, that overcomes the slow speed of data transmission.
For example, it may be the case that the arrays are relatively small and fast to transmit, whereas the computation performed on each array is slow and can benefit from being performed in separate processes.
Alternatively, preparing the array may be computationally expensive and benefit from being performed in a separate process, and once prepared, the arrays are small and fast to transmit to another process that requires them.
Given these situations, how can we share data between Processes in Python?
Share a Numpy Array As a Shared ctype
One way to share a numpy array between processes is via shared ctypes.
The ctypes module provides tools for working with C data types.
Python provides the capability to share ctypes between processes on one system.
This is primarily achieved via the following classes:
- multiprocessing.Value: manage a shared value.
- multiprocessing.Array: manage an array of shared values.
A single Value or Array instance can be created in memory and shared among processes without duplicating the data.
You can learn more about sharing ctypes between processes in the tutorial:
We can copy a numpy array into a multiprocessing.Array and share it among multiple processes that can then read and write the same data.
A multiprocessing.Array can be created by specifying the data type and initial values.
For example:
...
# create an integer array
data = multiprocessing.Array(ctypes.c_int, (1, 2, 3, 4, 5))
We can use the numpy.ctypeslib.as_ctypes() function to determine the ctype for a given numpy array.
For example:
...
# get ctype for our array
ctype = as_ctypes(data)
We can then use this type along with the array data to initialize a new multiprocessing.Array.
For example:
...
# create ctype array initialized from our array
array = Array(ctype._type_, data)
This will make a copy of the data in the numpy array into the multiprocessing.Array.
Note that the shared ctype Array only supports one-dimensional arrays. This means that if you have a two-dimensional array, or more dimensions, you must flatten the array first before you copy it into the multiprocessing.Array, e.g. via the flatten() method.
If we know that two processes will not access the same array at the same time, we can configure the Array to not use a mutex lock (e.g. that process safety is not required). This can offer some performance benefits.
For example:
...
# create ctype array initialized from our array
array = Array(ctype._type_, data, lock=False)
Elements in the array can then be read and written using normal Python array indexes and slices.
For example:
...
# report the first 10 values in the array
print(array[:10])
Now that we know how to share a numpy array between processes using a shared ctype Array, let's look at a worked example.
Example of Sharing an Array Via a Shared Ctype
We can explore the case of sharing a numpy array between processes using a shared ctype array.
In this example, we will create a one-dimensional numpy array initialized with one values. We will then copy it into a shared ctype Array and share the array with a child process. The child process will then change the contents of the array. The parent process will then confirm that the contents of the shared ctype array were changed.
First, we will define a function to execute in a child process.
The function will take the shared ctype array as an argument.
Firstly, it will check that the contents of the array match what was expected, e.g. what was passed in from the parent process. It will then change the content of the array to all zero values and confirm that the content of the array was changed.
The task() function listed below implements this.
# task executed in a child process
def task(array):
# check some data in the array
print(array[:10], len(array))
# change data in the array
for i in range(len(array)):
array = 0.0
# confirm the data was changed
print(array[:10], len(array))
Next, the main process will create a new array with a modest size, initialized to all one values.
It then reports the contents of the array, to confirm it indeed contains all one values.
...
# define the size of the numpy array
n = 10000
# create the numpy array
data = ones((n,))
print(data[:10], data.shape)
Next, the main process gets the ctype equivalent for the array type and uses this type, along with the content of the array to create a new shared ctype Array without a mutex lock.
A lock is not required in this case as we know that the two processes will not be modifying the array at the same time.
...
# get ctype for our array
ctype = as_ctypes(data)
# create ctype array initialized from our array
array = Array(ctype._type_, data, lock=False)
We can then confirm that the new shared ctype Array contains the same data as the numpy array.
...
# confirm the contents of the shared array
print(array[:10], len(array))
The parent process then creates a new child process, configured to execute our task() function, and passes it the shared ctype Array as an argument.
The child process is started and the main process blocks until the child process terminates.
...
# create a child process
child = Process(target=task, args=(array,))
# start the child process
child.start()
# wait for the child process to complete
child.join()
Finally, the parent process checks the contents of the array.
...
# check some data in the shared array
print(array[:10], len(array))
Tying this together, the complete example is listed below.
# share numpy array via a shared ctype
from multiprocessing import Process
from multiprocessing.sharedctypes import Array
from numpy import ones
from numpy.ctypeslib import as_ctypes
# task executed in a child process
def task(array):
# check some data in the array
print(array[:10], len(array))
# change data in the array
for i in range(len(array)):
array = 0.0
# confirm the data was changed
print(array[:10], len(array))
# protect the entry point
if __name__ == '__main__':
# define the size of the numpy array
n = 10000
# create the numpy array
data = ones((n,))
print(data[:10], data.shape)
# get ctype for our array
ctype = as_ctypes(data)
# create ctype array initialized from our array
array = Array(ctype._type_, data, lock=False)
# confirm contents of the shared array
print(array[:10], len(array))
# create a child process
child = Process(target=task, args=(array,))
# start the child process
child.start()
# wait for the child process to complete
child.join()
# check some data in the shared array
print(array[:10], len(array))
Running the example first creates a numpy array with 10,000 elements, initialized to one values.
The content of the numpy array is then confirmed.
Next, the ctype of the array is determined and is used along with the numpy array itself to create a new shared ctype Array.
The content of the numpy array is copied into the shared ctype array and we confirm that the shared ctype array contains all one values.
Next, the child process is created and started and the parent process blocks.
The child process runs. It first confirms the contents of the shared ctype Array contains all one values.
It then updates the shared array to contain all zero values and confirms the array's contents were changed.
The child process terminates and the parent process resumes.
The parent process checks the contents of the shared array and confirms that it was changed by the child process.
This highlights that both parent and child processes operated upon the same single array in memory.
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] (10000,)
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 10000
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] 10000
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 10000
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] 10000
Takeaways
You now know how to share a numpy array between processes using a ctype array.
If you enjoyed this tutorial, you will love my book: Concurrent NumPy in Python. It covers everything you need to master the topic with hands-on examples and clear explanations.