Last Updated on October 19, 2023
You can share a NumPy array between processes in Python using a suite of different techniques.
Generally, if the shared array can be read-only, then the fastest method to share the array between processes is via inheritance.
If multiple processes need to read and write the shared NumPy array then a memory-mapped file or shared memory-backed array is the fastest.
Sharing NumPy arrays using queues and pipes is slower by nearly two orders of magnitude than other methods.
In this tutorial, you will discover the speed of different methods to share a NumPy array between processes in Python.
Let’s get started.
9 Ways to Share a NumPy Array Between Processes
There are many ways to share a NumPy array between processes.
Nevertheless, there are perhaps 9 main approaches that we can implement using the Python standard library.
They are:
- Share NumPy Array via Inheritance
- Share NumPy Array via Function Argument
- Share NumPy Array Via Pipe
- Share NumPy Array Via Queue
- Share NumPy Array via ctype Array
- Share NumPy Array via ctype RawArray
- Share NumPy Array via SharedMemory
- Share NumPy Array via Memory Mapped File
- Share NumPy Array via Manager
You can learn more about most ways to share NumPy arrays in the tutorial:
Are there other methods you want me to benchmark consistently?
Let me know in the comments below.
Next, let’s consider the setup for benchmarking each approach.
Run loops using all CPUs, download your FREE book to learn how.
How to Benchmark NumPy Array Sharing
Once we know the different ways to share a NumPy array between processes and what approach to use when the question becomes:
- What is the fastest approach to sharing a NumPy array between processes?
We will explore this question in this tutorial.
Firstly, we can define a benchmark test setup so that it is consistent in each case.
Each approach will explore sharing a NumPy array with 100,000,000 floating point values, internalized with numpy.ones().
For example:
1 2 3 4 |
... # define the global variable n = 100000000 data = ones((n,)) |
Each task executed in the new child process will simply report the shape of the shared NumPy array, in order to confirm that the array is accessible.
For example:
1 2 3 4 |
# task executed in a child process def task(data): # report details of data print(f'>child has data: {data.shape}') |
All of the preparation of the NumPy array, the creation and starting of the new child process, and the sharing of the array with the child process will occur within a main() function.
For example:
1 2 3 4 5 6 7 8 9 10 11 |
# create a child process and share a numpy array def main(): # define the data n = 100000000 data = ones((n,)) # create a child process child = Process(target=task, args=(data,)) # start the child process child.start() # wait for the child process to complete child.join() |
The benchmarking will be repeated 3 times and the average time will be reported as the methods benchmark results.
This will involve recording the start and end time using the time.perf_counter() function (preferred over time.time()), calling the main() function and reporting the results for each run, as well as calculating and reporting the average after all runs.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
You can learn more about benchmarking with the time.perf_counter() function in the tutorial:
Let’s dive into the methods
Benchmark Sharing NumPy Array Via Inheritance
We can explore the benchmark of how to share a NumPy array between processes using inheritance.
This requires first setting the start method to the ‘fork’ method as the very first line of the program.
1 2 3 |
... # ensure we are using fork start method set_start_method('fork') |
Next, in the main() method, we can explicitly declare the NumPy array as a global variable, and then define it.
1 2 3 4 5 6 |
... # declare the global variable global data # define the global variable n = 100000000 data = ones((n,)) |
Finally, in the task() function we can also explicitly declare the NumPy array global variable.
1 2 3 |
... # declare the global variable global data |
Declaring global variables explicitly is not required, but I think it’s a good practice in some cases, especially here as it is the focus of the test.
You can learn more about sharing a NumPy array between processes via inheritance in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# SuperFastPython.com # benchmark of sharing a numpy array using inheritance from multiprocessing import set_start_method from multiprocessing import Process from numpy import ones from time import perf_counter # task executed in a child process def task(): # declare the global variable global data # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # declare the global variable global data # define the global variable n = 100000000 data = ones((n,)) # create a child process child = Process(target=task) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # ensure we are using fork start method set_start_method('fork') # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using inheritance.
The array is created and shared 3 times.
Note: The ‘fork’ start method used in this example is not available on all systems (e.g. Windows). This means you may not be able to run this example.
In this case, this method takes about 0.250 seconds (250 milliseconds) on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 0.240 seconds >child has data: (100000000,) >1 took 0.258 seconds >child has data: (100000000,) >2 took 0.252 seconds Average Time: 0.250 seconds |
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
Benchmark Sharing NumPy Array Via Function Argument
We can explore the benchmark of how to share a NumPy array between processes using a function argument.
This requires updating the task() function to take the NumPy array as an argument.
1 2 3 4 |
# task executed in a child process def task(data): # report details of data print(f'>child has data: {data.shape}') |
It also requires updating the main() function to share the NumPy array with the child process as a function argument.
1 2 3 |
... # create a child process child = Process(target=task, args=(data,)) |
You can learn more about sharing a NumPy array between processes via a function argument in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
# SuperFastPython.com # benchmark of sharing a numpy array using function argument from multiprocessing import Process from numpy import ones from time import perf_counter # task executed in a child process def task(data): # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # define the data n = 100000000 data = ones((n,)) # create a child process child = Process(target=task, args=(data,)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a function argument.
The array is created and shared 3 times.
In this case, this method takes about 1.413 seconds on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 1.559 seconds >child has data: (100000000,) >1 took 1.334 seconds >child has data: (100000000,) >2 took 1.346 seconds Average Time: 1.413 seconds |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Benchmark Sharing NumPy Array Via Pipe
We can explore the benchmark of how to share a NumPy array between processes using a pipe data structure.
This requires updating the task() function to take a Pipe reception connection as an argument and calling the recv() method to read the NumPy array.
1 2 3 4 5 6 |
# task executed in a child process def task(conn1): # receive the data data = conn1.recv() # report details of data print(f'>child has data: {data.shape}') |
It also requires updating the main() method to create the pipe send and receive connections, then sending the NumPy array on the pipe and passing the receiving connection to the child process as an argument.
Notice that we send the array after the process is running to ensure the bytes are read and the internal buffers of the pipe are not overwhelmed before the child process starts reading.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# create a child process and share a numpy array def main(): # create the shared queue conn1, conn2 = Pipe() # define the data n = 100000000 data = ones((n,)) # create a child process child = Process(target=task, args=(conn1,)) # start the child process child.start() # share the data conn2.send(data) # wait for the child process to complete child.join() |
You can see a fuller example of sharing a NumPy array between processes in the tutorial:
You can learn more about how to use the multiprocessing Pipe in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# SuperFastPython.com # benchmark of sharing a numpy array using pipe from multiprocessing import Process from multiprocessing import Pipe from numpy import ones from time import perf_counter # task executed in a child process def task(conn1): # receive the data data = conn1.recv() # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # create the shared queue conn1, conn2 = Pipe() # define the data n = 100000000 data = ones((n,)) # create a child process child = Process(target=task, args=(conn1,)) # start the child process child.start() # share the data conn2.send(data) # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a pipe.
The array is created and shared 3 times.
In this case, this method takes about 129.228 seconds on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 131.518 seconds >child has data: (100000000,) >1 took 133.835 seconds >child has data: (100000000,) >2 took 122.330 seconds Average Time: 129.228 seconds |
Benchmark Sharing NumPy Array Via Queue
We can explore the benchmark of how to share a NumPy array between processes using a queue data structure.
This requires updating the task() function to take a Queue as an argument and retrieve the NumPy array via the get() method.
1 2 3 4 5 6 |
# task executed in a child process def task(queue): # get the data data = queue.get() # report details of data print(f'>child has data: {data.shape}') |
It also requires updating the main() method to create the shared queue and putting the NumPy array on the queue.
The queue then needs to be shared with the child process via a function argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# create a child process and share a numpy array def main(): # create the shared queue queue = Queue() # define the data n = 100000000 data = ones((n,)) # put the data in the queue queue.put(data) # create a child process child = Process(target=task, args=(queue,)) # start the child process child.start() # wait for the child process to complete child.join() |
You can see a fuller example of sharing a NumPy array between processes using a queue in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# SuperFastPython.com # benchmark of sharing a numpy array using queue from multiprocessing import Process from multiprocessing import Queue from numpy import ones from time import perf_counter # task executed in a child process def task(queue): # get the data data = queue.get() # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # create the shared queue queue = Queue() # define the data n = 100000000 data = ones((n,)) # put the data in the queue queue.put(data) # create a child process child = Process(target=task, args=(queue,)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a queue.
The array is created and shared 3 times.
In this case, this method takes about 11.285 seconds on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 11.251 seconds >child has data: (100000000,) >1 took 11.330 seconds >child has data: (100000000,) >2 took 11.273 seconds Average Time: 11.285 seconds |
Benchmark Share NumPy Array using ctype Array
We can explore the benchmark of how to share a NumPy array between processes using a shared ctype Array.
This requires that we share the ctype Array with the child process, such as an inherited global variable or function argument.
We will use a function argument in this case for simplicity (e.g. the ‘fork’ start method is not available on all systems).
1 2 3 4 |
# task executed in a child process def task(array): # report details of data print(f'>child has data: {len(array)}') |
This also requires updating the main() function to create the shared ctype array with the correct data type, and then populating it with the content of our NumPy array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# create a child process and share a numpy array def main(): # define the data n = 100000000 data = ones((n,)) # get ctype for our array ctype = as_ctypes(data) # create ctype array initialized from our array array = Array(ctype._type_, data, lock=False) # create a child process child = Process(target=task, args=(array,)) # start the child process child.start() # wait for the child process to complete child.join() |
You can learn more about sharing a NumPy array between processes using a shared ctype array in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# SuperFastPython.com # benchmark of sharing a numpy array ctype array from multiprocessing import Process from multiprocessing.sharedctypes import Array from numpy import ones from numpy.ctypeslib import as_ctypes from time import perf_counter # task executed in a child process def task(array): # report details of data print(f'>child has data: {len(array)}') # create a child process and share a numpy array def main(): # define the data n = 100000000 data = ones((n,)) # get ctype for our array ctype = as_ctypes(data) # create ctype array initialized from our array array = Array(ctype._type_, data, lock=False) # create a child process child = Process(target=task, args=(array,)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype Array.
The array is created and shared 3 times.
In this case, this method takes about 15.013 seconds on average.
1 2 3 4 5 6 7 |
>child has data: 100000000 >0 took 15.021 seconds >child has data: 100000000 >1 took 15.049 seconds >child has data: 100000000 >2 took 14.969 seconds Average Time: 15.013 seconds |
Benchmark Share NumPy Array using ctype RawArray
We can explore the benchmark of how to share a NumPy array between processes using a shared ctype RawArray.
This requires that we share the ctype RawArray with the child process, such as an inherited global variable or function argument.
We will use a function argument in this case for simplicity (e.g. the ‘fork’ start method is not available on all systems).
1 2 3 4 |
# task executed in a child process def task(data): # report details of data print(f'>child has data: {data.shape}') |
We must also update the main() function to first create the RawArray and allocate memory for it, then use it as a buffer to create the NumPy array.
The RawArray is then passed to the new child process as a function argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# create a child process and share a numpy array def main(): # create the raw array n = 100000000 array = RawArray('d', n) # create numpy array from raw array buffer data = frombuffer(array, dtype=double, count=len(array)) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(data,)) # start the child process child.start() # wait for the child process to complete child.join() |
You can learn more about sharing a NumPy array between processes using a shared ctype RawArray in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# SuperFastPython.com # benchmark of sharing a numpy array ctype rawarray from multiprocessing import Process from multiprocessing.sharedctypes import RawArray from numpy import frombuffer from numpy import double from time import perf_counter # task executed in a child process def task(data): # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # create the raw array n = 100000000 array = RawArray('d', n) # create numpy array from raw array buffer data = frombuffer(array, dtype=double, count=len(array)) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(data,)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype RawArray.
The array is created and shared 3 times.
In this case, this method takes about 1.642 seconds on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 1.798 seconds >child has data: (100000000,) >1 took 1.578 seconds >child has data: (100000000,) >2 took 1.551 seconds Average Time: 1.642 seconds |
Benchmark Share NumPy Array using SharedMemory
We can explore the benchmark of how to share a NumPy array between processes using SharedMemory.
This requires first updating the task() function to connect to the named shared memory, and then creating a NumPy array using the shared memory as a buffer.
The size of the SharedMemory is provided to the function as an argument.
1 2 3 4 5 6 7 8 9 10 |
# task executed in a child process def task(n): # attach another shared memory block sm = SharedMemory('MyMemory') # create a new numpy array that uses the shared memory data = ndarray((n,), dtype=double, buffer=sm.buf) # report details of data print(f'>child has data: {data.shape}') # close the shared memory sm.close() |
It also requires updating the main() function to define the shared memory and use it to create a NumPy array that is then initialized.
Once the program is finished with the shared memory, it is then unlinked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# create a child process and share a numpy array def main(): # define the size of the array n = 100000000 # bytes required for the array (8 bytes per value for doubles) n_bytes = n * 8 # create the shared memory sm = SharedMemory(name='MyMemory', create=True, size=n_bytes) # create a new numpy array that uses the shared memory data = ndarray((n,), dtype=double, buffer=sm.buf) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(n,)) # start the child process child.start() # wait for the child process to complete child.join() # release the shared memory sm.unlink() |
You can learn more about sharing NumPy arrays between processes using SharedMemory in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# SuperFastPython.com # benchmark of sharing a numpy array with sharedmemory from multiprocessing import Process from multiprocessing.shared_memory import SharedMemory from numpy import ndarray from numpy import double from time import perf_counter # task executed in a child process def task(n): # attach another shared memory block sm = SharedMemory('MyMemory') # create a new numpy array that uses the shared memory data = ndarray((n,), dtype=double, buffer=sm.buf) # report details of data print(f'>child has data: {data.shape}') # close the shared memory sm.close() # create a child process and share a numpy array def main(): # define the size of the array n = 100000000 # bytes required for the array (8 bytes per value for doubles) n_bytes = n * 8 # create the shared memory sm = SharedMemory(name='MyMemory', create=True, size=n_bytes) # create a new numpy array that uses the shared memory data = ndarray((n,), dtype=double, buffer=sm.buf) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(n,)) # start the child process child.start() # wait for the child process to complete child.join() # release the shared memory sm.unlink() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using SharedMemory.
The array is created and shared 3 times.
In this case, this method takes about 0.372 seconds (372 milliseconds) on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 0.370 seconds >child has data: (100000000,) >1 took 0.379 seconds >child has data: (100000000,) >2 took 0.367 seconds Average Time: 0.372 seconds |
Benchmark Share NumPy Array using Memory-Mapped File
We can explore the benchmark of how to share a NumPy array between processes using a memory-mapped NumPy array file.
This requires updating the task() function to define and access the existing memory-mapped NumPy array file using the numpy.memmap() function.
The name of the memory-mapped file and the size are provided as arguments to the function.
1 2 3 4 5 6 |
# task executed in a child process def task(filename, n): # load the memory mapped file data = memmap(filename, dtype='float32', mode='r+', shape=(n,)) # report details of data print(f'>child has data: {data.shape}') |
It also requires updating the main() function to define the memory-mapped NumPy array file and initialize the array.
The filename and size of the array are provided to the child process as arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# create a child process and share a numpy array def main(): # define the size of the data n = 100000000 # define the filename filename = 'data.np' # create the memory mapped file data = memmap(filename, dtype='float32', mode='w+', shape=(n,)) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(filename,n)) # start the child process child.start() # wait for the child process to complete child.join() |
You can learn more about sharing a NumPy array between processes using a memory mapped file in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# SuperFastPython.com # benchmark of sharing a numpy array using memory mapped file from multiprocessing import Process from numpy import memmap from time import perf_counter # task executed in a child process def task(filename, n): # load the memory mapped file data = memmap(filename, dtype='float32', mode='r+', shape=(n,)) # report details of data print(f'>child has data: {data.shape}') # create a child process and share a numpy array def main(): # define the size of the data n = 100000000 # define the filename filename = 'data.np' # create the memory-mapped file data = memmap(filename, dtype='float32', mode='w+', shape=(n,)) # populate the array data.fill(1.0) # create a child process child = Process(target=task, args=(filename,n)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a memory-mapped NumPy array.
The array is created and shared 3 times.
In this case, this method takes about 0.334 seconds (334 milliseconds) on average.
1 2 3 4 5 6 7 |
>child has data: (100000000,) >0 took 0.342 seconds >child has data: (100000000,) >1 took 0.334 seconds >child has data: (100000000,) >2 took 0.325 seconds Average Time: 0.334 seconds |
Benchmark Share NumPy Array using Manager
We can explore the benchmark of how to share a NumPy array between processes using a Manager.
This requires first defining a custom manager that allows custom types to be defined and hosted.
1 2 3 4 |
# custom manager to support custom classes class CustomManager(BaseManager): # nothing pass |
Next, the proxy object for the hosted array needs to be shared with the child process.
In this case, we will share it using a function argument as this approach will work on all platforms.
The “shape” attribute is not directly accessible via the proxy object, so we will call a method on the array instead, in this case the sum() which we expect to equal 100 million.
1 2 3 4 |
# task executed in a child process def task(data_proxy): # report details of data print(f'>child has data: {data_proxy.sum()}') |
Next, the main() method needs to be updated to create the CustomManager via the context manager interface, then create the hosted array, providing a proxy object that can be passed to the child process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# create a child process and share a numpy array def main(): # create and start the custom manager with CustomManager() as manager: # define the data n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) # create a child process child = Process(target=task, args=(data_proxy,)) # start the child process child.start() # wait for the child process to complete child.join() |
Finally, the first line of the program can register our custom function for creating a NumPy array of all ones in the manager process.
1 2 3 |
... # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) |
You can learn more about sharing a NumPy array between processes using a manager in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# SuperFastPython.com # benchmark of sharing a numpy array using manager from multiprocessing import Process from multiprocessing.managers import BaseManager from numpy import ones from time import perf_counter # custom manager to support custom classes class CustomManager(BaseManager): # nothing pass # task executed in a child process def task(data_proxy): # report details of data print(f'>child has data: {data_proxy.sum()}') # create a child process and share a numpy array def main(): # create and start the custom manager with CustomManager() as manager: # define the data n = 100000000 # create a shared numpy array data_proxy = manager.shared_array((n,)) # create a child process child = Process(target=task, args=(data_proxy,)) # start the child process child.start() # wait for the child process to complete child.join() # protect the entry point if __name__ == '__main__': # register a function for creating numpy arrays on the manager CustomManager.register('shared_array', ones) # repeat the benchmark results = list() for i in range(3): # record start time time_start = perf_counter() # perform task main() # record duration time_duration = perf_counter() - time_start # report progress print(f'>{i} took {time_duration:.3f} seconds') # store result results.append(time_duration) # calculate average time time_avg = sum(results) / 3.0 # report average time print(f'Average Time: {time_avg:.3f} seconds') |
Running the example runs the benchmarking of sharing the NumPy array between processes using a manager.
This does not actually share the array, but rather the proxy objects. Nevertheless, there is setup time required to create the manager and the hosted array.
The array is created and shared 3 times.
In this case, this method takes about 0.574 seconds (574 milliseconds) on average.
1 2 3 4 5 6 7 |
>child has data: 100000000.0 >0 took 0.586 seconds >child has data: 100000000.0 >1 took 0.570 seconds >child has data: 100000000.0 >2 took 0.566 seconds Average Time: 0.574 seconds |
Comparison of Results
Now that we have benchmarked all of the popular methods for sharing a NumPy array between processes, we can compare the results.
The table below summarizes the performance of each method using the same benchmark methodology.
It also comments on whether the method copies the array or not. Specifically whether the child process works with a copy of the array.
1 2 3 4 5 6 7 8 9 10 11 |
Method | Copy? | Time (sec) --------------------------------------- Inherited | Yes | 0.250 (*) Function Argument | Yes | 1.413 Queue | Yes | 11.285 Pipe | Yes | 129.228 ctype Array | No | 15.013 ctype RawArray | No | 1.642 SharedMemory | No | 0.372 MemoryMapped File | No | 0.334 (*) Manager | No | 0.574 |
We can see from the methods that create a copy of the array that inheriting the array is the fastest. A limitation of this approach is that it requires the fork start method for child processes that is not available on all systems (e.g. windows)
Passing the array as a function argument is also fast, and a good alternative that will work on all platforms.
From the methods that do not copy the array, we can see that the memory mapped file is the fastest method, followed very closely by the SharedMemory method. Either approach provides an effective method for sharing an array among processes that allows read and write access on the data. A muex may also be needed to avoid possible race conditions.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Takeaways
You now know the speed of different methods to share a NumPy array between processes in Python.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any additional tips?
I’d love to hear about them!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Niranjan B S on Unsplash
Do you have any questions?