Last Updated on September 29, 2023
You can share a numpy array between processes by hosting it in a manager server process and sharing proxy objects for working with the hosted array.
In this tutorial, you will discover how to share an array between processes using a manager.
Let’s get started.
Need to Share Numpy Array Between Processes
Python offers process-based concurrency via the multiprocessing module.
Process-based concurrency is appropriate for those tasks that are CPU-bound, as opposed to thread-based concurrency in Python which is generally suited to IO-bound tasks given the presence of the Global Interpreter Lock (GIL).
You can learn more about process-based concurrency and the multiprocessing module in the tutorial:
Consider the situation where we need to share numpy arrays between processes.
This may be for many reasons, such as:
- Data is loaded as an array in one process and analyzed differently in different subprocesses.
- Many child processes load small data as arrays that are sent to a parent process for handling.
- Data arrays are loaded in the parent process and processed in a suite of child processes.
Sharing Python objects and data between processes is slow.
This is because any data, like numpy arrays, shared between processes must be transmitted using inter-process communication (ICP) requiring the data first be pickled by the sender and then unpickled by the receiver.
You can learn more about this in the tutorial:
This means that if we share numpy arrays between processes, it assumes that we receive some benefit, such as a speedup, that overcomes the slow speed of data transmission.
For example, it may be the case that the arrays are relatively small and fast to transmit, whereas the computation performed on each array is slow and can benefit from being performed in separate processes.
Alternatively, preparing the array may be computationally expensive and benefit from being performed in a separate process, and once prepared, the arrays are small and fast to transmit to another process that requires them.
Given these situations, how can we share data between Processes in Python?
Run loops using all CPUs, download your FREE book to learn how.
How to Share a Numpy Array Using a Manager
One way to share a numpy array efficiently between processes is to use a manager.
Multiprocessing Manager provides a way of creating centralized Python objects that can be shared safely among processes.
Manager objects create a server process that is used to host Python objects. Managers then return proxy objects used to interact with the hosted objects.
You can learn more about multiprocessing Managers in the tutorial:
A numpy array can be hosted by defining a custom Manager and configuring it to support numpy arrays.
This requires first defining a custom Manager that extends the BaseManager.
For example:
1 2 3 4 |
# custom manager to support custom classes class CustomManager(BaseManager): # nothing pass |
We can then register our numpy array with the custom manager via the register() function.
One approach is to register a numpy function used to create a numpy array on the server process, such as the numpy.ones() function.
For example:
1 2 3 |
... # register a function for creating numpy arrays on the manager CustomManager.register('ones', ones) |
We can then create the custom manager to start the server process.
1 2 3 4 |
... # create and start the custom manager with CustomManager() as manager: # ... |
The numpy array can then be created in the server process by calling the registered function, after which a proxy object is returned. The hosted numpy array can then be interacted with via the proxy object.
For example:
1 2 3 |
... # create a shared numpy array data_proxy = manager.shared_array((10,10)) |
The proxy object can then be passed between processes, allowing multiple processes to manipulate the same hosted numpy array.
You can learn more about hosting custom objects in manager processes in the tutorial:
Now that we know how to share a numpy array using a manager, let’s look at some worked examples.
Example of Hosting a NumPy Array With a Manager
We can explore an example of hosting a numpy array in a manager server process.
In this example, we will create a numpy array and host it in a manager process. We will then perform functions on the array and time how long it takes. We will then make a copy of the array and transmit it back to the parent process and perform the same operation. Finally, we will compare the time it takes to perform the operation on the copy of the data vs the same operation on the server.
Firstly, we can define the custom manager class that will allow us to register the numpy array.
1 2 3 4 |
# custom manager to support custom classes class CustomManager(BaseManager): # nothing pass |
Next, we can register the numpy array on the custom manager.
In this case, we will register the numpy.ones() function used to create a numpy array on the server with the given shape. We will call this function “shared_array“.
1 2 3 |
... # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) |
We can then create and start the custom manager and create the array on the server, returning a proxy object for interacting with the array.
1 2 3 4 5 6 7 8 |
... # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 50000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') |
We will then calculate the sum of the values in the array.
Given that the array has 50,000,000 elements all filled with one values, we expect the sum to be equal to the number of elements.
This operation is performed on the server and will report the time of this operation.
1 2 3 4 5 6 |
... # time sum operation on array in server process start = time() result = data_proxy.sum() duration = time() - start print(f'Sum on hosted array took {duration:.3f} seconds') |
Next, we will make a copy of the array and transmit it back to the parent process.
This can be achieved by calling the _getvalue() function on the proxy object that returns the hosted object.
This will transmit the entire array from the manager process to the parent process using inter-process communication. As such it will be very slow.
We will retrieve the array and calculate the sum, as before, and report the time taken.
1 2 3 4 5 6 |
... # time copy array and sum operation start = time() result = data_proxy._getvalue().sum() duration = time() - start print(f'Sum on copied array took {duration:.3f} seconds') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# host a numpy array on the server from time import time from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 50000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # time sum operation on array in server process start = time() result = data_proxy.sum() duration = time() - start print(f'Sum on hosted array took {duration:.3f} seconds') # time copy array and sum operation start = time() result = data_proxy._getvalue().sum() duration = time() - start print(f'Sum on copied array took {duration:.3f} seconds') |
Running the example first registers the numpy array on the custom manager class.
The custom manager is then created and started.
Next, a numpy array with 50 million elements is created on the manager’s server process and a proxy object is returned.
Printing the proxy object confirms the content of the array is one values.
Next, the proxy object is used to calculate the sum of the array on the server process. The time taken to perform this operation is reported as about 26 milliseconds.
Next, the same operation is performed, this time on a copy of the array transmitted from the server process to the parent process.
This takes more than 100 seconds to complete. It is expected to take much longer as the array has to be pickled and transmitted between processes before the operation can be performed.
This provides a contrast between the approach of performing operations on the array in a hosted process compared to transmitting the array in order to perform operations upon it.
1 2 3 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Sum on hosted array took 0.026 seconds Sum on copied array took 101.411 seconds |
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
Example of Sharing a Numpy Array Using a Manager
We can explore the case of sharing a numpy array hosted in a manager between processes.
In this example, we will create an array hosted in a manager process and report the sum of the values in the array. We will then start a child process and pass it the proxy objects for the array and have it perform the same sum operation on the array.
This will highlight how easy it is for multiple processes to operate on the same array efficiently via proxy objects.
Firstly, we will define the custom manager class so that we can register numpy arrays.
1 2 3 4 |
# custom manager to support custom classes class CustomManager(BaseManager): # nothing pass |
Next, we will define a function to execute in a child process. The function will take the proxy object for the numpy array and calculate the sum of values in the array.
1 2 3 4 |
# task executed in a child process def task(data_proxy): # report details of the array print(f'Array sum (in child): {data_proxy.sum()}') |
Next, in the main process, we will register a function for creating the hosted numpy array with the custom manager.
1 2 3 |
... # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) |
Next, we will create and start the custom manager and create the hosted numpy array with 100,000,000 elements.
1 2 3 4 5 6 7 8 |
... # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') |
We will then calculate the sum of the values in the array on the server process, which we expect to equal 100,000,000, as all values equal one.
1 2 3 |
... # confirm content print(f'Array sum: {data_proxy.sum()}') |
Finally, we will create and configure a child process, configured to execute our task() function, then start it and wait for it to terminate.
1 2 3 4 5 |
... # start a child process process = Process(target=task, args=(data_proxy,)) process.start() process.join() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# share a numpy array between processes using a manager from multiprocessing import Process from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # task executed in a child process def task(data_proxy): # report details of the array print(f'Array sum (in child): {data_proxy.sum()}') # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # start a child process process = Process(target=task, args=(data_proxy,)) process.start() process.join() |
Running the example first registers the numpy array with the custom manager.
Next, the custom manager is created and started.
The numpy array is created on the manager’s server process and a proxy object is returned. The proxy object is printed, providing a string representation of the hosted array, confirming it has one values.
The parent process then reports the sum of the values in the array. This is calculated on the server process and the value is then reported as 100 million, matching our expectations.
Next, a child process is configured to execute our task() function and passed the proxy object. The parent process then blocks until the child process terminates.
The child process runs and executes the task() function. The sum of the array is calculated on the server process and reported via the child process.
This highlights that both processes are easily able to operate directly upon the same hosted array.
1 2 3 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Array sum: 100000000.0 Array sum (in child): 100000000.0 |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example Accessing Attributes on Hosted Numpy Array
A limitation of proxy objects is that we cannot access the attributes of a hosted object.
This may be a problem if we wish to access attributes on our numpy array, such as the “shape“, “size“, or other attributes of the numpy.ndarray class.
In this section, we will explore how we might overcome this limitation.
Example of Accessing Attribute Directly (failure)
Firstly, we can create a numpy array hosted in the server process, as before.
We will then attempt to access the “shape” attribute via the proxy object.
1 2 3 |
... # access shape of hosted array print(data_proxy.shape) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# access attribute of hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access shape of hosted array print(data_proxy.shape) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
We attempt to access the shape of the array.
In this case, an AttributeError exception is raised, reporting that the hosted object has no attribute “shape“.
The error is misleading, but it is in fact attempting to execute a shape() method on the hosted object and has no concept of a “shape” attribute.
1 2 3 4 5 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Array sum: 100000000.0 Traceback (most recent call last): ... AttributeError: 'AutoProxy[shared_array]' object has no attribute 'shape'. Did you mean: 'reshape'? |
Example of Accessing Attribute via __getattr__() (failure)
Attributes on a Python object are retrieved via the __getattr__() method.
Perhaps we can call this method on the hosted object and pass it the “shape” attribute as an argument.
We can call a method directly on the hosted object via the _callmethod() method on the proxy object.
For example:
1 2 3 |
... # access shape of hosted array print(data_proxy._callmethod('__getattr__', args=('shape',))) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# access attribute of hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access shape of hosted array print(data_proxy._callmethod('__getattr__', args=('shape',))) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
We attempt to access the shape of the array via the __getattr__() method.
This fails with a RemoteError exception, highlighting that this method does not exist on the numpy.ndarray.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Array sum: 100000000.0 Traceback (most recent call last): ... multiprocessing.managers.RemoteError: --------------------------------------------------------------------------- Traceback (most recent call last): File .../multiprocessing/managers.py", line 265, in serve_client raise AttributeError( AttributeError: method '__getattr__' of <class 'numpy.ndarray'> object is not in exposed={'partition', 'dump', 'max', 'tofile', 'all', 'trace', 'newbyteorder', 'dumps', 'ravel', 'conjugate', 'take', 'resize', 'sort', 'diagonal', 'conj', 'argmax', 'argsort', 'itemset', 'searchsorted', 'tostring', 'cumsum', 'setfield', 'clip', 'std', 'sum', 'var', 'reshape', 'swapaxes', 'any', 'repeat', 'mean', 'view', 'astype', 'argpartition', 'round', 'argmin', 'choose', 'tobytes', 'ptp', 'byteswap', 'fill', 'tolist', 'cumprod', 'min', 'nonzero', 'prod', 'squeeze', 'transpose', 'setflags', 'copy', 'dot', 'put', 'item', 'flatten', 'getfield', 'compress'} During handling of the above exception, another exception occurred: Traceback (most recent call last): ... KeyError: '__getattr__' --------------------------------------------------------------------------- |
Example of Accessing Attribute via Wrapper Class
Another approach is to define a new class that wraps our numpy array.
For example, we can define a class that creates the array when it is instantiated, then provides a method called attribute() to call the get getattr() built-in function on the array.
A downside of this approach is that any methods we want to call on the numpy array directly will need to be passed on from the custom object to the internal numpy array. One approach is to use a direct wrapper method.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 |
# helper wrapping a numpy array class ArrayHelper(): def __init__(self, dim): self.array = ones(dim) # access attributes on the numpy array def attribute(self, attr): return getattr(self.array, attr) # call functions on the numpy array def sum(self): return self.array.sum() |
We can then register the custom class with our custom manager class.
1 2 3 |
... # register the python class with the custom manager CustomManager.register('ArrayHelper', ArrayHelper) |
Finally, we can then call the attribute() method on our custom class directly and pass it the name of the attribute on the numpy array we wish to access.
1 2 3 |
... # access shape of hosted array print(data_proxy.attribute('shape')) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# access attribute of hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # helper wrapping a numpy array class ArrayHelper(): def __init__(self, dim): self.array = ones(dim) # access attributes on the numpy array def attribute(self, attr): return getattr(self.array, attr) # call functions on the numpy array def sum(self): return self.array.sum() # protect the entry point if __name__ == '__main__': # register the python class with the custom manager CustomManager.register('ArrayHelper', ArrayHelper) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.ArrayHelper((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access shape of hosted array print(data_proxy.attribute('shape')) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
Finally, the shape attribute is accessed via the helper method on the custom object. The shape is returned and reported correctly.
Although this approach achieves the desired effect, it requires using an alternate API on the wrapper object, rather than the ndarray API directly.
1 2 3 |
Array created on host: <__mp_main__.ArrayHelper object at 0x10b6835e0> Array sum: 100000000.0 (100000000,) |
There may be even better solutions to exposing attributes on the hosted array, such as defining a custom Proxy object that supports attributes on hosted objects.
Example Using Array Indices on Hosted Numpy Array
Another limitation of a hosted numpy array is that we cannot easily access array values via array indices.
In this section, we will explore how to access data in a hosted numpy array.
Example of Accessing Numpy Data Directly (failure)
The proxy objects do not support the [:] array syntax.
We can demonstrate this with a worked example.
The example below creates a hosted numpy array as before. It then attempts to access the first 10 values of the hosted array via the array index syntax on the proxy object.
1 2 3 |
... # access data via array index print(data_proxy[:10]) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# access data via array index in hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access data via array index print(data_proxy[:10]) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
Next, we try to retrieve a copy of the first 10 values in the array using the array syntax with a slice.
This fails with a TypeError exception, highlighting that the proxy object does not support the subscribable syntax for the hosted object.
1 2 3 4 5 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Array sum: 100000000.0 Traceback (most recent call last): ... TypeError: 'AutoProxy[shared_array]' object is not subscriptable |
Example of Accessing Numpy Data via __getitem__() (failure)
We can try and work around this error.
We can attempt to call the __getitem__() method on the hosted object and pass it a slice object.
The _callmethod() can be called on the proxy object and passed the name of the “__getitem__” method along with a slice object as an argument.
Recall, we can construct a slice directly via the slice() built-in function.
For example:
1 2 3 |
... # access data via array index print(data_proxy._callmethod('__getitem__', slice(0,10))) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# access data via array index in hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access data via array index print(data_proxy._callmethod('__getitem__', slice(0,10))) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
Next, the __getitem__() is called on the hosted array with the given slice.
This fails with a RemoteError exception suggesting that the __getitem__() is not available on the hosted object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Array created on host: array([1., 1., 1., ..., 1., 1., 1.]) Array sum: 100000000.0 Traceback (most recent call last): ... multiprocessing.managers.RemoteError: --------------------------------------------------------------------------- Traceback (most recent call last): File ".../multiprocessing/managers.py", line 265, in serve_client raise AttributeError( AttributeError: method '__getitem__' of <class 'numpy.ndarray'> object is not in exposed={'view', 'trace', 'choose', 'tofile', 'diagonal', 'put', 'dot', 'transpose', 'all', 'copy', 'ravel', 'std', 'tolist', 'dumps', 'newbyteorder', 'getfield', 'max', 'mean', 'argmin', 'round', 'byteswap', 'setflags', 'take', 'argmax', 'argpartition', 'clip', 'repeat', 'tobytes', 'fill', 'searchsorted', 'resize', 'var', 'reshape', 'any', 'partition', 'nonzero', 'cumsum', 'cumprod', 'min', 'conjugate', 'conj', 'itemset', 'setfield', 'argsort', 'sort', 'ptp', 'compress', 'prod', 'squeeze', 'item', 'tostring', 'dump', 'sum', 'swapaxes', 'astype', 'flatten'} During handling of the above exception, another exception occurred: Traceback (most recent call last): ... KeyError: '__getitem__' --------------------------------------------------------------------------- |
Example of Accessing Numpy Data With Wrapper Class
We can solve this issue as we did with the need to access attributes on the hosted object.
That is, we can define a custom class that wraps the array and provide a method that offers the desired capability.
In this case, we can define a method called getdata() that takes a slice argument and returns data at that slice.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 |
# helper wrapping a numpy array class ArrayHelper(): def __init__(self, dim): self.array = ones(dim) # access array data by slice def getdata(self, array_slice): return self.array[array_slice] # call functions on the numpy array def sum(self): return self.array.sum() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# access data via array index in hosted numpy array from multiprocessing.managers import BaseManager from numpy import ones # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # helper wrapping a numpy array class ArrayHelper(): def __init__(self, dim): self.array = ones(dim) # access array data by slice def getdata(self, array_slice): return self.array[array_slice] # call functions on the numpy array def sum(self): return self.array.sum() # protect the entry point if __name__ == '__main__': # register the a python class with the custom manager CustomManager.register('ArrayHelper', ArrayHelper) # create and start the custom manager with CustomManager() as manager: # define the size of the numpy array n = 100000000 # create a shared numpy array data_proxy = manager.ArrayHelper((n,)) print(f'Array created on host: {data_proxy}') # confirm content print(f'Array sum: {data_proxy.sum()}') # access data in the array print(data_proxy.getdata(slice(0,10))) |
Running the example creates and starts the manager as before.
The array is created and its contents are reported correctly.
Finally, the custom getdata() method is accessed on the hosted object. This returns a copy of the data at the given slice, which is a sub-array that contains the first 10 values.
1 2 3 |
Array created on host: <__mp_main__.ArrayHelper object at 0x10945f640> Array sum: 100000000.0 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] |
Recommendations
Hosting a numpy array and working across multiple processes via proxy objects is efficient.
We do not need to copy the array from process to process, which saves a lot of time.
Nevertheless, the functionality on the hosted array is limited via the proxy object. This includes accessing attributes on the array and data via array indexes.
A compromise might be to clearly define the use cases for accessing and using the array across the processes, then define a custom class to wrap the array and perform the desired operations.
This may also allow the use of mutex locks to ensure changes to the hosted array are also process-safe.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Takeaways
You now know how to share an array between processes using a manager.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by logojackmowo Yao on Unsplash
Do you have any questions?