Fastest Way To Share NumPy Array Between Processes

Last Updated on October 19, 2023

You can share a NumPy array between processes in Python using a suite of different techniques.

Generally, if the shared array can be read-only, then the fastest method to share the array between processes is via inheritance.

If multiple processes need to read and write the shared NumPy array then a memory-mapped file or shared memory-backed array is the fastest.

Sharing NumPy arrays using queues and pipes is slower by nearly two orders of magnitude than other methods.

In this tutorial, you will discover the speed of different methods to share a NumPy array between processes in Python.

Let’s get started.

Table of Contents

9 Ways to Share a NumPy Array Between Processes

There are many ways to share a NumPy array between processes.

Nevertheless, there are perhaps 9 main approaches that we can implement using the Python standard library.

They are:

Share NumPy Array via Inheritance
Share NumPy Array via Function Argument
Share NumPy Array Via Pipe
Share NumPy Array Via Queue
Share NumPy Array via ctype Array
Share NumPy Array via ctype RawArray
Share NumPy Array via SharedMemory
Share NumPy Array via Memory Mapped File
Share NumPy Array via Manager

You can learn more about most ways to share NumPy arrays in the tutorial:

9 Ways to Share a NumPy Array Between Processes

Are there other methods you want me to benchmark consistently?
Let me know in the comments below.

Next, let’s consider the setup for benchmarking each approach.

Run loops using all CPUs, download your FREE book to learn how.

How to Benchmark NumPy Array Sharing

Once we know the different ways to share a NumPy array between processes and what approach to use when the question becomes:

What is the fastest approach to sharing a NumPy array between processes?

We will explore this question in this tutorial.

Firstly, we can define a benchmark test setup so that it is consistent in each case.

Each approach will explore sharing a NumPy array with 100,000,000 floating point values, internalized with numpy.ones().

For example:

...

# define the global variable

n = 100000000

data = ones((n,))

Each task executed in the new child process will simply report the shape of the shared NumPy array, in order to confirm that the array is accessible.

For example:

# task executed in a child process

def task(data):

# report details of data

print(f'>child has data: {data.shape}')

All of the preparation of the NumPy array, the creation and starting of the new child process, and the sharing of the array with the child process will occur within a main() function.

For example:

# create a child process and share a numpy array

def main():

# define the data

n = 100000000

data = ones((n,))

# create a child process

child = Process(target=task, args=(data,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

The benchmarking will be repeated 3 times and the average time will be reported as the methods benchmark results.

This will involve recording the start and end time using the time.perf_counter() function (preferred over time.time()), calling the main() function and reporting the results for each run, as well as calculating and reporting the average after all runs.

For example:

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

You can learn more about benchmarking with the time.perf_counter() function in the tutorial:

Benchmark Python with time.perf_counter()

Let’s dive into the methods

Start Now: Free Concurrent NumPy Crash Course

Benchmark Sharing NumPy Array Via Inheritance

We can explore the benchmark of how to share a NumPy array between processes using inheritance.

This requires first setting the start method to the ‘fork’ method as the very first line of the program.

...

# ensure we are using fork start method

set_start_method('fork')

Next, in the main() method, we can explicitly declare the NumPy array as a global variable, and then define it.

...

# declare the global variable

global data

# define the global variable

n = 100000000

data = ones((n,))

Finally, in the task() function we can also explicitly declare the NumPy array global variable.

...

# declare the global variable

global data

Declaring global variables explicitly is not required, but I think it’s a good practice in some cases, especially here as it is the focus of the test.

You can learn more about sharing a NumPy array between processes via inheritance in the tutorial:

Share Numpy Array Between Processes Using Global Variable

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using inheritance

from multiprocessing import set_start_method

from multiprocessing import Process

from numpy import ones

from time import perf_counter

# task executed in a child process

def task():

# declare the global variable

global data

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# declare the global variable

global data

# define the global variable

n = 100000000

data = ones((n,))

# create a child process

child = Process(target=task)

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# ensure we are using fork start method

set_start_method('fork')

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using inheritance.

The array is created and shared 3 times.

Note: The ‘fork’ start method used in this example is not available on all systems (e.g. Windows). This means you may not be able to run this example.

In this case, this method takes about 0.250 seconds (250 milliseconds) on average.

>child has data: (100000000,)

>0 took 0.240 seconds

>child has data: (100000000,)

>1 took 0.258 seconds

>child has data: (100000000,)

>2 took 0.252 seconds

Average Time: 0.250 seconds

Free Concurrent NumPy Course

Get FREE access to my 7-day email course on concurrent NumPy.

Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.

Learn more

Benchmark Sharing NumPy Array Via Function Argument

We can explore the benchmark of how to share a NumPy array between processes using a function argument.

This requires updating the task() function to take the NumPy array as an argument.

# task executed in a child process

def task(data):

# report details of data

print(f'>child has data: {data.shape}')

It also requires updating the main() function to share the NumPy array with the child process as a function argument.

...

# create a child process

child = Process(target=task, args=(data,))

You can learn more about sharing a NumPy array between processes via a function argument in the tutorial:

Share Numpy Array Between Processes Using Function Arguments and Return Values

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using function argument

from multiprocessing import Process

from numpy import ones

from time import perf_counter

# task executed in a child process

def task(data):

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# define the data

n = 100000000

data = ones((n,))

# create a child process

child = Process(target=task, args=(data,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a function argument.

The array is created and shared 3 times.

In this case, this method takes about 1.413 seconds on average.

>child has data: (100000000,)

>0 took 1.559 seconds

>child has data: (100000000,)

>1 took 1.334 seconds

>child has data: (100000000,)

>2 took 1.346 seconds

Average Time: 1.413 seconds

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Benchmark Sharing NumPy Array Via Pipe

We can explore the benchmark of how to share a NumPy array between processes using a pipe data structure.

This requires updating the task() function to take a Pipe reception connection as an argument and calling the recv() method to read the NumPy array.

# task executed in a child process

def task(conn1):

# receive the data

data = conn1.recv()

# report details of data

print(f'>child has data: {data.shape}')

It also requires updating the main() method to create the pipe send and receive connections, then sending the NumPy array on the pipe and passing the receiving connection to the child process as an argument.

Notice that we send the array after the process is running to ensure the bytes are read and the internal buffers of the pipe are not overwhelmed before the child process starts reading.

# create a child process and share a numpy array

def main():

# create the shared queue

conn1, conn2 = Pipe()

# define the data

n = 100000000

data = ones((n,))

# create a child process

child = Process(target=task, args=(conn1,))

# start the child process

child.start()

# share the data

conn2.send(data)

# wait for the child process to complete

child.join()

You can see a fuller example of sharing a NumPy array between processes in the tutorial:

7 Ways to Share a Numpy Array Between Processes

You can learn more about how to use the multiprocessing Pipe in the tutorial:

Multiprocessing Pipe in Python

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using pipe

from multiprocessing import Process

from multiprocessing import Pipe

from numpy import ones

from time import perf_counter

# task executed in a child process

def task(conn1):

# receive the data

data = conn1.recv()

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# create the shared queue

conn1, conn2 = Pipe()

# define the data

n = 100000000

data = ones((n,))

# create a child process

child = Process(target=task, args=(conn1,))

# start the child process

child.start()

# share the data

conn2.send(data)

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a pipe.

The array is created and shared 3 times.

In this case, this method takes about 129.228 seconds on average.

>child has data: (100000000,)

>0 took 131.518 seconds

>child has data: (100000000,)

>1 took 133.835 seconds

>child has data: (100000000,)

>2 took 122.330 seconds

Average Time: 129.228 seconds

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Benchmark Sharing NumPy Array Via Queue

We can explore the benchmark of how to share a NumPy array between processes using a queue data structure.

This requires updating the task() function to take a Queue as an argument and retrieve the NumPy array via the get() method.

# task executed in a child process

def task(queue):

# get the data

data = queue.get()

# report details of data

print(f'>child has data: {data.shape}')

It also requires updating the main() method to create the shared queue and putting the NumPy array on the queue.

The queue then needs to be shared with the child process via a function argument.

# create a child process and share a numpy array

def main():

# create the shared queue

queue = Queue()

# define the data

n = 100000000

data = ones((n,))

# put the data in the queue

queue.put(data)

# create a child process

child = Process(target=task, args=(queue,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

You can see a fuller example of sharing a NumPy array between processes using a queue in the tutorial:

Share Numpy Array Between Processes Using a Queue

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using queue

from multiprocessing import Process

from multiprocessing import Queue

from numpy import ones

from time import perf_counter

# task executed in a child process

def task(queue):

# get the data

data = queue.get()

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# create the shared queue

queue = Queue()

# define the data

n = 100000000

data = ones((n,))

# put the data in the queue

queue.put(data)

# create a child process

child = Process(target=task, args=(queue,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a queue.

The array is created and shared 3 times.

In this case, this method takes about 11.285 seconds on average.

>child has data: (100000000,)

>0 took 11.251 seconds

>child has data: (100000000,)

>1 took 11.330 seconds

>child has data: (100000000,)

>2 took 11.273 seconds

Average Time: 11.285 seconds

Benchmark Share NumPy Array using ctype Array

We can explore the benchmark of how to share a NumPy array between processes using a shared ctype Array.

This requires that we share the ctype Array with the child process, such as an inherited global variable or function argument.

We will use a function argument in this case for simplicity (e.g. the ‘fork’ start method is not available on all systems).

# task executed in a child process

def task(array):

# report details of data

print(f'>child has data: {len(array)}')

This also requires updating the main() function to create the shared ctype array with the correct data type, and then populating it with the content of our NumPy array.

# create a child process and share a numpy array

def main():

# define the data

n = 100000000

data = ones((n,))

# get ctype for our array

ctype = as_ctypes(data)

# create ctype array initialized from our array

array = Array(ctype._type_, data, lock=False)

# create a child process

child = Process(target=task, args=(array,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

You can learn more about sharing a NumPy array between processes using a shared ctype array in the tutorial:

Share Numpy Array Between Processes With Shared ctypes

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array ctype array

from multiprocessing import Process

from multiprocessing.sharedctypes import Array

from numpy import ones

from numpy.ctypeslib import as_ctypes

from time import perf_counter

# task executed in a child process

def task(array):

# report details of data

print(f'>child has data: {len(array)}')

# create a child process and share a numpy array

def main():

# define the data

n = 100000000

data = ones((n,))

# get ctype for our array

ctype = as_ctypes(data)

# create ctype array initialized from our array

array = Array(ctype._type_, data, lock=False)

# create a child process

child = Process(target=task, args=(array,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype Array.

The array is created and shared 3 times.

In this case, this method takes about 15.013 seconds on average.

>child has data: 100000000

>0 took 15.021 seconds

>child has data: 100000000

>1 took 15.049 seconds

>child has data: 100000000

>2 took 14.969 seconds

Average Time: 15.013 seconds

Benchmark Share NumPy Array using ctype RawArray

We can explore the benchmark of how to share a NumPy array between processes using a shared ctype RawArray.

This requires that we share the ctype RawArray with the child process, such as an inherited global variable or function argument.

We will use a function argument in this case for simplicity (e.g. the ‘fork’ start method is not available on all systems).

# task executed in a child process

def task(data):

# report details of data

print(f'>child has data: {data.shape}')

We must also update the main() function to first create the RawArray and allocate memory for it, then use it as a buffer to create the NumPy array.

The RawArray is then passed to the new child process as a function argument.

# create a child process and share a numpy array

def main():

# create the raw array

n = 100000000

array = RawArray('d', n)

# create numpy array from raw array buffer

data = frombuffer(array, dtype=double, count=len(array))

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(data,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

You can learn more about sharing a NumPy array between processes using a shared ctype RawArray in the tutorial:

Share a Numpy Array Between Processes Backed By RawArray

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array ctype rawarray

from multiprocessing import Process

from multiprocessing.sharedctypes import RawArray

from numpy import frombuffer

from numpy import double

from time import perf_counter

# task executed in a child process

def task(data):

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# create the raw array

n = 100000000

array = RawArray('d', n)

# create numpy array from raw array buffer

data = frombuffer(array, dtype=double, count=len(array))

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(data,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype RawArray.

The array is created and shared 3 times.

In this case, this method takes about 1.642 seconds on average.

>child has data: (100000000,)

>0 took 1.798 seconds

>child has data: (100000000,)

>1 took 1.578 seconds

>child has data: (100000000,)

>2 took 1.551 seconds

Average Time: 1.642 seconds

Benchmark Share NumPy Array using SharedMemory

We can explore the benchmark of how to share a NumPy array between processes using SharedMemory.

This requires first updating the task() function to connect to the named shared memory, and then creating a NumPy array using the shared memory as a buffer.

The size of the SharedMemory is provided to the function as an argument.

# task executed in a child process

def task(n):

# attach another shared memory block

sm = SharedMemory('MyMemory')

# create a new numpy array that uses the shared memory

data = ndarray((n,), dtype=double, buffer=sm.buf)

# report details of data

print(f'>child has data: {data.shape}')

# close the shared memory

sm.close()

It also requires updating the main() function to define the shared memory and use it to create a NumPy array that is then initialized.

Once the program is finished with the shared memory, it is then unlinked.

# create a child process and share a numpy array

def main():

# define the size of the array

n = 100000000

# bytes required for the array (8 bytes per value for doubles)

n_bytes = n * 8

# create the shared memory

sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)

# create a new numpy array that uses the shared memory

data = ndarray((n,), dtype=double, buffer=sm.buf)

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(n,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# release the shared memory

sm.unlink()

You can learn more about sharing NumPy arrays between processes using SharedMemory in the tutorial:

How to Share Numpy Array Using SharedMemory

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array with sharedmemory

from multiprocessing import Process

from multiprocessing.shared_memory import SharedMemory

from numpy import ndarray

from numpy import double

from time import perf_counter

# task executed in a child process

def task(n):

# attach another shared memory block

sm = SharedMemory('MyMemory')

# create a new numpy array that uses the shared memory

data = ndarray((n,), dtype=double, buffer=sm.buf)

# report details of data

print(f'>child has data: {data.shape}')

# close the shared memory

sm.close()

# create a child process and share a numpy array

def main():

# define the size of the array

n = 100000000

# bytes required for the array (8 bytes per value for doubles)

n_bytes = n * 8

# create the shared memory

sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)

# create a new numpy array that uses the shared memory

data = ndarray((n,), dtype=double, buffer=sm.buf)

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(n,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# release the shared memory

sm.unlink()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using SharedMemory.

The array is created and shared 3 times.

In this case, this method takes about 0.372 seconds (372 milliseconds) on average.

>child has data: (100000000,)

>0 took 0.370 seconds

>child has data: (100000000,)

>1 took 0.379 seconds

>child has data: (100000000,)

>2 took 0.367 seconds

Average Time: 0.372 seconds

Benchmark Share NumPy Array using Memory-Mapped File

We can explore the benchmark of how to share a NumPy array between processes using a memory-mapped NumPy array file.

This requires updating the task() function to define and access the existing memory-mapped NumPy array file using the numpy.memmap() function.

The name of the memory-mapped file and the size are provided as arguments to the function.

# task executed in a child process

def task(filename, n):

# load the memory mapped file

data = memmap(filename, dtype='float32', mode='r+', shape=(n,))

# report details of data

print(f'>child has data: {data.shape}')

It also requires updating the main() function to define the memory-mapped NumPy array file and initialize the array.

The filename and size of the array are provided to the child process as arguments.

# create a child process and share a numpy array

def main():

# define the size of the data

n = 100000000

# define the filename

filename = 'data.np'

# create the memory mapped file

data = memmap(filename, dtype='float32', mode='w+', shape=(n,))

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(filename,n))

# start the child process

child.start()

# wait for the child process to complete

child.join()

You can learn more about sharing a NumPy array between processes using a memory mapped file in the tutorial:

Share NumPy Array Using Memory-Mapped File

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using memory mapped file

from multiprocessing import Process

from numpy import memmap

from time import perf_counter

# task executed in a child process

def task(filename, n):

# load the memory mapped file

data = memmap(filename, dtype='float32', mode='r+', shape=(n,))

# report details of data

print(f'>child has data: {data.shape}')

# create a child process and share a numpy array

def main():

# define the size of the data

n = 100000000

# define the filename

filename = 'data.np'

# create the memory-mapped file

data = memmap(filename, dtype='float32', mode='w+', shape=(n,))

# populate the array

data.fill(1.0)

# create a child process

child = Process(target=task, args=(filename,n))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a memory-mapped NumPy array.

The array is created and shared 3 times.

In this case, this method takes about 0.334 seconds (334 milliseconds) on average.

>child has data: (100000000,)

>0 took 0.342 seconds

>child has data: (100000000,)

>1 took 0.334 seconds

>child has data: (100000000,)

>2 took 0.325 seconds

Average Time: 0.334 seconds

Benchmark Share NumPy Array using Manager

We can explore the benchmark of how to share a NumPy array between processes using a Manager.

This requires first defining a custom manager that allows custom types to be defined and hosted.

# custom manager to support custom classes

class CustomManager(BaseManager):

# nothing

pass

Next, the proxy object for the hosted array needs to be shared with the child process.

In this case, we will share it using a function argument as this approach will work on all platforms.

The “shape” attribute is not directly accessible via the proxy object, so we will call a method on the array instead, in this case the sum() which we expect to equal 100 million.

# task executed in a child process

def task(data_proxy):

# report details of data

print(f'>child has data: {data_proxy.sum()}')

Next, the main() method needs to be updated to create the CustomManager via the context manager interface, then create the hosted array, providing a proxy object that can be passed to the child process.

# create a child process and share a numpy array

def main():

# create and start the custom manager

with CustomManager() as manager:

# define the data

n = 100000000

# create a shared numpy array

data_proxy = manager.shared_array((n,))

# create a child process

child = Process(target=task, args=(data_proxy,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

Finally, the first line of the program can register our custom function for creating a NumPy array of all ones in the manager process.

...

# register a function for creating numpy arrays on the manager

CustomManager.register('shared_array', ones)

You can learn more about sharing a NumPy array between processes using a manager in the tutorial:

Share a Numpy Array Between Processes Using a Manager

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark of sharing a numpy array using manager

from multiprocessing import Process

from multiprocessing.managers import BaseManager

from numpy import ones

from time import perf_counter

# custom manager to support custom classes

class CustomManager(BaseManager):

# nothing

pass

# task executed in a child process

def task(data_proxy):

# report details of data

print(f'>child has data: {data_proxy.sum()}')

# create a child process and share a numpy array

def main():

# create and start the custom manager

with CustomManager() as manager:

# define the data

n = 100000000

# create a shared numpy array

data_proxy = manager.shared_array((n,))

# create a child process

child = Process(target=task, args=(data_proxy,))

# start the child process

child.start()

# wait for the child process to complete

child.join()

# protect the entry point

if __name__ == '__main__':

# register a function for creating numpy arrays on the manager

CustomManager.register('shared_array', ones)

# repeat the benchmark

results = list()

for i in range(3):

# record start time

time_start = perf_counter()

# perform task

main()

# record duration

time_duration = perf_counter() - time_start

# report progress

print(f'>{i} took {time_duration:.3f} seconds')

# store result

results.append(time_duration)

# calculate average time

time_avg = sum(results) / 3.0

# report average time

print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a manager.

This does not actually share the array, but rather the proxy objects. Nevertheless, there is setup time required to create the manager and the hosted array.

The array is created and shared 3 times.

In this case, this method takes about 0.574 seconds (574 milliseconds) on average.

>child has data: 100000000.0

>0 took 0.586 seconds

>child has data: 100000000.0

>1 took 0.570 seconds

>child has data: 100000000.0

>2 took 0.566 seconds

Average Time: 0.574 seconds

Comparison of Results

Now that we have benchmarked all of the popular methods for sharing a NumPy array between processes, we can compare the results.

The table below summarizes the performance of each method using the same benchmark methodology.

It also comments on whether the method copies the array or not. Specifically whether the child process works with a copy of the array.

Method | Copy? | Time (sec)

---------------------------------------

Inherited | Yes | 0.250 (*)

Function Argument | Yes | 1.413

Queue | Yes | 11.285

Pipe | Yes | 129.228

ctype Array | No | 15.013

ctype RawArray | No | 1.642

SharedMemory | No | 0.372

MemoryMapped File | No | 0.334 (*)

Manager | No | 0.574

We can see from the methods that create a copy of the array that inheriting the array is the fastest. A limitation of this approach is that it requires the fork start method for child processes that is not available on all systems (e.g. windows)

Passing the array as a function argument is also fast, and a good alternative that will work on all platforms.

From the methods that do not copy the array, we can see that the memory mapped file is the fastest method, followed very closely by the SharedMemory method. Either approach provides an effective method for sharing an array among processes that allows read and write access on the data. A muex may also be needed to avoid possible race conditions.

Takeaways

You now know the speed of different methods to share a NumPy array between processes in Python.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any additional tips?
I’d love to hear about them!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Niranjan B S on Unsplash

Fastest Way To Share NumPy Array Between Processes

9 Ways to Share a NumPy Array Between Processes

How to Benchmark NumPy Array Sharing

Benchmark Sharing NumPy Array Via Inheritance

Benchmark Sharing NumPy Array Via Function Argument

Benchmark Sharing NumPy Array Via Pipe

Benchmark Sharing NumPy Array Via Queue

Benchmark Share NumPy Array using ctype Array

Benchmark Share NumPy Array using ctype RawArray

Benchmark Share NumPy Array using SharedMemory

Benchmark Share NumPy Array using Memory-Mapped File

Benchmark Share NumPy Array using Manager

Comparison of Results

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Concurrent NumPy Fast
(without the frustration)

Additional menu

9 Ways to Share a NumPy Array Between Processes

How to Benchmark NumPy Array Sharing

Benchmark Sharing NumPy Array Via Inheritance

Benchmark Sharing NumPy Array Via Function Argument

Benchmark Sharing NumPy Array Via Pipe

Benchmark Sharing NumPy Array Via Queue

Benchmark Share NumPy Array using ctype Array

Benchmark Share NumPy Array using ctype RawArray

Benchmark Share NumPy Array using SharedMemory

Benchmark Share NumPy Array using Memory-Mapped File

Benchmark Share NumPy Array using Manager

Comparison of Results

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Concurrent NumPy Fast (without the frustration)

Learn Concurrent NumPy Fast
(without the frustration)