Fastest Way To Share NumPy Array Between Processes

August 31, 2023 Concurrent NumPy

You can share a NumPy array between processes in Python using a suite of different techniques.

Generally, if the shared array can be read-only, then the fastest method to share the array between processes is via inheritance.

If multiple processes need to read and write the shared NumPy array then a memory-mapped file or shared memory-backed array is the fastest.

Sharing NumPy arrays using queues and pipes is slower by nearly two orders of magnitude than other methods.

In this tutorial, you will discover the speed of different methods to share a NumPy array between processes in Python.

Let's get started.

9 Ways to Share a NumPy Array Between Processes

There are many ways to share a NumPy array between processes.

Nevertheless, there are perhaps 9 main approaches that we can implement using the Python standard library.

They are:

Share NumPy Array via Inheritance
Share NumPy Array via Function Argument
Share NumPy Array Via Pipe
Share NumPy Array Via Queue
Share NumPy Array via ctype Array
Share NumPy Array via ctype RawArray
Share NumPy Array via SharedMemory
Share NumPy Array via Memory Mapped File
Share NumPy Array via Manager

You can learn more about most ways to share NumPy arrays in the tutorial:

9 Ways to Share a NumPy Array Between Processes

Are there other methods you want me to benchmark consistently?
Let me know in the comments below.

Next, let's consider the setup for benchmarking each approach.

How to Benchmark NumPy Array Sharing

Once we know the different ways to share a NumPy array between processes and what approach to use when the question becomes:

What is the fastest approach to sharing a NumPy array between processes?

We will explore this question in this tutorial.

Firstly, we can define a benchmark test setup so that it is consistent in each case.

Each approach will explore sharing a NumPy array with 100,000,000 floating point values, internalized with numpy.ones().

For example:

...
# define the global variable
n = 100000000
data = ones((n,))

Each task executed in the new child process will simply report the shape of the shared NumPy array, in order to confirm that the array is accessible.

For example:

# task executed in a child process
def task(data):
    # report details of data
    print(f'>child has data: {data.shape}')

All of the preparation of the NumPy array, the creation and starting of the new child process, and the sharing of the array with the child process will occur within a main() function.

For example:

# create a child process and share a numpy array
def main():
    # define the data
    n = 100000000
    data = ones((n,))
    # create a child process
    child = Process(target=task, args=(data,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

The benchmarking will be repeated 3 times and the average time will be reported as the methods benchmark results.

This will involve recording the start and end time using the time.perf_counter() function (preferred over time.time()), calling the main() function and reporting the results for each run, as well as calculating and reporting the average after all runs.

For example:

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

You can learn more about benchmarking with the time.perf_counter() function in the tutorial:

Benchmark Python with time.perf_counter()

Let's dive into the methods

Benchmark Sharing NumPy Array Via Inheritance

We can explore the benchmark of how to share a NumPy array between processes using inheritance.

This requires first setting the start method to the 'fork' method as the very first line of the program.

...
# ensure we are using fork start method
set_start_method('fork')

Next, in the main() method, we can explicitly declare the NumPy array as a global variable, and then define it.

...
# declare the global variable
global data
# define the global variable
n = 100000000
data = ones((n,))

Finally, in the task() function we can also explicitly declare the NumPy array global variable.

...
# declare the global variable
global data

Declaring global variables explicitly is not required, but I think it's a good practice in some cases, especially here as it is the focus of the test.

You can learn more about sharing a NumPy array between processes via inheritance in the tutorial:

Share Numpy Array Between Processes Using Global Variable

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using inheritance
from multiprocessing import set_start_method
from multiprocessing import Process
from numpy import ones
from time import perf_counter

# task executed in a child process
def task():
    # declare the global variable
    global data
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # declare the global variable
    global data
    # define the global variable
    n = 100000000
    data = ones((n,))
    # create a child process
    child = Process(target=task)
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # ensure we are using fork start method
    set_start_method('fork')
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using inheritance.

The array is created and shared 3 times.

Note: The 'fork' start method used in this example is not available on all systems (e.g. Windows). This means you may not be able to run this example.

In this case, this method takes about 0.250 seconds (250 milliseconds) on average.

>child has data: (100000000,)
>0 took 0.240 seconds
>child has data: (100000000,)
>1 took 0.258 seconds
>child has data: (100000000,)
>2 took 0.252 seconds
Average Time: 0.250 seconds

Benchmark Sharing NumPy Array Via Function Argument

We can explore the benchmark of how to share a NumPy array between processes using a function argument.

This requires updating the task() function to take the NumPy array as an argument.

# task executed in a child process
def task(data):
    # report details of data
    print(f'>child has data: {data.shape}')

It also requires updating the main() function to share the NumPy array with the child process as a function argument.

...
# create a child process
child = Process(target=task, args=(data,))

You can learn more about sharing a NumPy array between processes via a function argument in the tutorial:

Share Numpy Array Between Processes Using Function Arguments and Return Values

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using function argument
from multiprocessing import Process
from numpy import ones
from time import perf_counter

# task executed in a child process
def task(data):
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # define the data
    n = 100000000
    data = ones((n,))
    # create a child process
    child = Process(target=task, args=(data,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a function argument.

The array is created and shared 3 times.

In this case, this method takes about 1.413 seconds on average.

>child has data: (100000000,)
>0 took 1.559 seconds
>child has data: (100000000,)
>1 took 1.334 seconds
>child has data: (100000000,)
>2 took 1.346 seconds
Average Time: 1.413 seconds

Benchmark Sharing NumPy Array Via Pipe

We can explore the benchmark of how to share a NumPy array between processes using a pipe data structure.

This requires updating the task() function to take a Pipe reception connection as an argument and calling the recv() method to read the NumPy array.

# task executed in a child process
def task(conn1):
    # receive the data
    data = conn1.recv()
    # report details of data
    print(f'>child has data: {data.shape}')

It also requires updating the main() method to create the pipe send and receive connections, then sending the NumPy array on the pipe and passing the receiving connection to the child process as an argument.

Notice that we send the array after the process is running to ensure the bytes are read and the internal buffers of the pipe are not overwhelmed before the child process starts reading.

# create a child process and share a numpy array
def main():
    # create the shared queue
    conn1, conn2 = Pipe()
    # define the data
    n = 100000000
    data = ones((n,))
    # create a child process
    child = Process(target=task, args=(conn1,))
    # start the child process
    child.start()
    # share the data
    conn2.send(data)
    # wait for the child process to complete
    child.join()

You can see a fuller example of sharing a NumPy array between processes in the tutorial:

7 Ways to Share a Numpy Array Between Processes

You can learn more about how to use the multiprocessing Pipe in the tutorial:

Multiprocessing Pipe in Python

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using pipe
from multiprocessing import Process
from multiprocessing import Pipe
from numpy import ones
from time import perf_counter

# task executed in a child process
def task(conn1):
    # receive the data
    data = conn1.recv()
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # create the shared queue
    conn1, conn2 = Pipe()
    # define the data
    n = 100000000
    data = ones((n,))
    # create a child process
    child = Process(target=task, args=(conn1,))
    # start the child process
    child.start()
    # share the data
    conn2.send(data)
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a pipe.

The array is created and shared 3 times.

In this case, this method takes about 129.228 seconds on average.

>child has data: (100000000,)
>0 took 131.518 seconds
>child has data: (100000000,)
>1 took 133.835 seconds
>child has data: (100000000,)
>2 took 122.330 seconds
Average Time: 129.228 seconds

Benchmark Sharing NumPy Array Via Queue

We can explore the benchmark of how to share a NumPy array between processes using a queue data structure.

This requires updating the task() function to take a Queue as an argument and retrieve the NumPy array via the get() method.

# task executed in a child process
def task(queue):
    # get the data
    data = queue.get()
    # report details of data
    print(f'>child has data: {data.shape}')

It also requires updating the main() method to create the shared queue and putting the NumPy array on the queue.

The queue then needs to be shared with the child process via a function argument.

# create a child process and share a numpy array
def main():
    # create the shared queue
    queue = Queue()
    # define the data
    n = 100000000
    data = ones((n,))
    # put the data in the queue
    queue.put(data)
    # create a child process
    child = Process(target=task, args=(queue,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

You can see a fuller example of sharing a NumPy array between processes using a queue in the tutorial:

Share Numpy Array Between Processes Using a Queue

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using queue
from multiprocessing import Process
from multiprocessing import Queue
from numpy import ones
from time import perf_counter

# task executed in a child process
def task(queue):
    # get the data
    data = queue.get()
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # create the shared queue
    queue = Queue()
    # define the data
    n = 100000000
    data = ones((n,))
    # put the data in the queue
    queue.put(data)
    # create a child process
    child = Process(target=task, args=(queue,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a queue.

The array is created and shared 3 times.

In this case, this method takes about 11.285 seconds on average.

>child has data: (100000000,)
>0 took 11.251 seconds
>child has data: (100000000,)
>1 took 11.330 seconds
>child has data: (100000000,)
>2 took 11.273 seconds
Average Time: 11.285 seconds

Benchmark Share NumPy Array using ctype Array

We can explore the benchmark of how to share a NumPy array between processes using a shared ctype Array.

This requires that we share the ctype Array with the child process, such as an inherited global variable or function argument.

We will use a function argument in this case for simplicity (e.g. the 'fork' start method is not available on all systems).

# task executed in a child process
def task(array):
    # report details of data
    print(f'>child has data: {len(array)}')

This also requires updating the main() function to create the shared ctype array with the correct data type, and then populating it with the content of our NumPy array.

# create a child process and share a numpy array
def main():
    # define the data
    n = 100000000
    data = ones((n,))
    # get ctype for our array
    ctype = as_ctypes(data)
    # create ctype array initialized from our array
    array = Array(ctype._type_, data, lock=False)
    # create a child process
    child = Process(target=task, args=(array,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

You can learn more about sharing a NumPy array between processes using a shared ctype array in the tutorial:

Share Numpy Array Between Processes With Shared ctypes

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array ctype array
from multiprocessing import Process
from multiprocessing.sharedctypes import Array
from numpy import ones
from numpy.ctypeslib import as_ctypes
from time import perf_counter

# task executed in a child process
def task(array):
    # report details of data
    print(f'>child has data: {len(array)}')

# create a child process and share a numpy array
def main():
    # define the data
    n = 100000000
    data = ones((n,))
    # get ctype for our array
    ctype = as_ctypes(data)
    # create ctype array initialized from our array
    array = Array(ctype._type_, data, lock=False)
    # create a child process
    child = Process(target=task, args=(array,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype Array.

The array is created and shared 3 times.

In this case, this method takes about 15.013 seconds on average.

>child has data: 100000000
>0 took 15.021 seconds
>child has data: 100000000
>1 took 15.049 seconds
>child has data: 100000000
>2 took 14.969 seconds
Average Time: 15.013 seconds

Benchmark Share NumPy Array using ctype RawArray

We can explore the benchmark of how to share a NumPy array between processes using a shared ctype RawArray.

This requires that we share the ctype RawArray with the child process, such as an inherited global variable or function argument.

We will use a function argument in this case for simplicity (e.g. the 'fork' start method is not available on all systems).

# task executed in a child process
def task(data):
    # report details of data
    print(f'>child has data: {data.shape}')

We must also update the main() function to first create the RawArray and allocate memory for it, then use it as a buffer to create the NumPy array.

The RawArray is then passed to the new child process as a function argument.

# create a child process and share a numpy array
def main():
    # create the raw array
    n = 100000000
    array = RawArray('d', n)
    # create numpy array from raw array buffer
    data = frombuffer(array, dtype=double, count=len(array))
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(data,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

You can learn more about sharing a NumPy array between processes using a shared ctype RawArray in the tutorial:

Share a Numpy Array Between Processes Backed By RawArray

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array ctype rawarray
from multiprocessing import Process
from multiprocessing.sharedctypes import RawArray
from numpy import frombuffer
from numpy import double
from time import perf_counter

# task executed in a child process
def task(data):
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # create the raw array
    n = 100000000
    array = RawArray('d', n)
    # create numpy array from raw array buffer
    data = frombuffer(array, dtype=double, count=len(array))
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(data,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a shared ctype RawArray.

The array is created and shared 3 times.

In this case, this method takes about 1.642 seconds on average.

>child has data: (100000000,)
>0 took 1.798 seconds
>child has data: (100000000,)
>1 took 1.578 seconds
>child has data: (100000000,)
>2 took 1.551 seconds
Average Time: 1.642 seconds

Benchmark Share NumPy Array using SharedMemory

We can explore the benchmark of how to share a NumPy array between processes using SharedMemory.

This requires first updating the task() function to connect to the named shared memory, and then creating a NumPy array using the shared memory as a buffer.

The size of the SharedMemory is provided to the function as an argument.

# task executed in a child process
def task(n):
    # attach another shared memory block
    sm = SharedMemory('MyMemory')
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=double, buffer=sm.buf)
    # report details of data
    print(f'>child has data: {data.shape}')
    # close the shared memory
    sm.close()

It also requires updating the main() function to define the shared memory and use it to create a NumPy array that is then initialized.

Once the program is finished with the shared memory, it is then unlinked.

# create a child process and share a numpy array
def main():
    # define the size of the array
    n = 100000000
    # bytes required for the array (8 bytes per value for doubles)
    n_bytes = n * 8
    # create the shared memory
    sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=double, buffer=sm.buf)
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(n,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()
    # release the shared memory
    sm.unlink()

You can learn more about sharing NumPy arrays between processes using SharedMemory in the tutorial:

How to Share Numpy Array Using SharedMemory

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array with sharedmemory
from multiprocessing import Process
from multiprocessing.shared_memory import SharedMemory
from numpy import ndarray
from numpy import double
from time import perf_counter

# task executed in a child process
def task(n):
    # attach another shared memory block
    sm = SharedMemory('MyMemory')
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=double, buffer=sm.buf)
    # report details of data
    print(f'>child has data: {data.shape}')
    # close the shared memory
    sm.close()

# create a child process and share a numpy array
def main():
    # define the size of the array
    n = 100000000
    # bytes required for the array (8 bytes per value for doubles)
    n_bytes = n * 8
    # create the shared memory
    sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=double, buffer=sm.buf)
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(n,))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()
    # release the shared memory
    sm.unlink()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using SharedMemory.

The array is created and shared 3 times.

In this case, this method takes about 0.372 seconds (372 milliseconds) on average.

>child has data: (100000000,)
>0 took 0.370 seconds
>child has data: (100000000,)
>1 took 0.379 seconds
>child has data: (100000000,)
>2 took 0.367 seconds
Average Time: 0.372 seconds

Benchmark Share NumPy Array using Memory-Mapped File

We can explore the benchmark of how to share a NumPy array between processes using a memory-mapped NumPy array file.

This requires updating the task() function to define and access the existing memory-mapped NumPy array file using the numpy.memmap() function.

The name of the memory-mapped file and the size are provided as arguments to the function.

# task executed in a child process
def task(filename, n):
    # load the memory mapped file
    data = memmap(filename, dtype='float32', mode='r+', shape=(n,))
    # report details of data
    print(f'>child has data: {data.shape}')

It also requires updating the main() function to define the memory-mapped NumPy array file and initialize the array.

The filename and size of the array are provided to the child process as arguments.

# create a child process and share a numpy array
def main():
    # define the size of the data
    n = 100000000
    # define the filename
    filename = 'data.np'
    # create the memory mapped file
    data = memmap(filename, dtype='float32', mode='w+', shape=(n,))
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(filename,n))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

You can learn more about sharing a NumPy array between processes using a memory mapped file in the tutorial:

Share NumPy Array Using Memory-Mapped File

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using memory mapped file
from multiprocessing import Process
from numpy import memmap
from time import perf_counter

# task executed in a child process
def task(filename, n):
    # load the memory mapped file
    data = memmap(filename, dtype='float32', mode='r+', shape=(n,))
    # report details of data
    print(f'>child has data: {data.shape}')

# create a child process and share a numpy array
def main():
    # define the size of the data
    n = 100000000
    # define the filename
    filename = 'data.np'
    # create the memory-mapped file
    data = memmap(filename, dtype='float32', mode='w+', shape=(n,))
    # populate the array
    data.fill(1.0)
    # create a child process
    child = Process(target=task, args=(filename,n))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()

# protect the entry point
if __name__ == '__main__':
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a memory-mapped NumPy array.

The array is created and shared 3 times.

In this case, this method takes about 0.334 seconds (334 milliseconds) on average.

>child has data: (100000000,)
>0 took 0.342 seconds
>child has data: (100000000,)
>1 took 0.334 seconds
>child has data: (100000000,)
>2 took 0.325 seconds
Average Time: 0.334 seconds

Benchmark Share NumPy Array using Manager

We can explore the benchmark of how to share a NumPy array between processes using a Manager.

This requires first defining a custom manager that allows custom types to be defined and hosted.

# custom manager to support custom classes
class CustomManager(BaseManager):
    # nothing
    pass

Next, the proxy object for the hosted array needs to be shared with the child process.

In this case, we will share it using a function argument as this approach will work on all platforms.

The "shape" attribute is not directly accessible via the proxy object, so we will call a method on the array instead, in this case the sum() which we expect to equal 100 million.

# task executed in a child process
def task(data_proxy):
    # report details of data
    print(f'>child has data: {data_proxy.sum()}')

Next, the main() method needs to be updated to create the CustomManager via the context manager interface, then create the hosted array, providing a proxy object that can be passed to the child process.

# create a child process and share a numpy array
def main():
    # create and start the custom manager
    with CustomManager() as manager:
        # define the data
        n = 100000000
        # create a shared numpy array
        data_proxy = manager.shared_array((n,))
        # create a child process
        child = Process(target=task, args=(data_proxy,))
        # start the child process
        child.start()
        # wait for the child process to complete
        child.join()

Finally, the first line of the program can register our custom function for creating a NumPy array of all ones in the manager process.

...
# register a function for creating numpy arrays on the manager
CustomManager.register('shared_array', ones)

You can learn more about sharing a NumPy array between processes using a manager in the tutorial:

Share a Numpy Array Between Processes Using a Manager

Tying this together, the complete example is listed below.

# SuperFastPython.com
# benchmark of sharing a numpy array using manager
from multiprocessing import Process
from multiprocessing.managers import BaseManager
from numpy import ones
from time import perf_counter

# custom manager to support custom classes
class CustomManager(BaseManager):
    # nothing
    pass

# task executed in a child process
def task(data_proxy):
    # report details of data
    print(f'>child has data: {data_proxy.sum()}')

# create a child process and share a numpy array
def main():
    # create and start the custom manager
    with CustomManager() as manager:
        # define the data
        n = 100000000
        # create a shared numpy array
        data_proxy = manager.shared_array((n,))
        # create a child process
        child = Process(target=task, args=(data_proxy,))
        # start the child process
        child.start()
        # wait for the child process to complete
        child.join()

# protect the entry point
if __name__ == '__main__':
    # register a function for creating numpy arrays on the manager
    CustomManager.register('shared_array', ones)
    # repeat the benchmark
    results = list()
    for i in range(3):
        # record start time
        time_start = perf_counter()
        # perform task
        main()
        # record duration
        time_duration = perf_counter() - time_start
        # report progress
        print(f'>{i} took {time_duration:.3f} seconds')
        # store result
        results.append(time_duration)
    # calculate average time
    time_avg = sum(results) / 3.0
    # report average time
    print(f'Average Time: {time_avg:.3f} seconds')

Running the example runs the benchmarking of sharing the NumPy array between processes using a manager.

This does not actually share the array, but rather the proxy objects. Nevertheless, there is setup time required to create the manager and the hosted array.

The array is created and shared 3 times.

In this case, this method takes about 0.574 seconds (574 milliseconds) on average.

>child has data: 100000000.0
>0 took 0.586 seconds
>child has data: 100000000.0
>1 took 0.570 seconds
>child has data: 100000000.0
>2 took 0.566 seconds
Average Time: 0.574 seconds

Comparison of Results

Now that we have benchmarked all of the popular methods for sharing a NumPy array between processes, we can compare the results.

The table below summarizes the performance of each method using the same benchmark methodology.

It also comments on whether the method copies the array or not. Specifically whether the child process works with a copy of the array.

Method            | Copy? | Time (sec)
---------------------------------------
Inherited         | Yes   |   0.250 (*)
Function Argument | Yes   |   1.413
Queue             | Yes   |  11.285
Pipe              | Yes   | 129.228
ctype Array       | No    |  15.013
ctype RawArray    | No    |   1.642
SharedMemory      | No    |   0.372
MemoryMapped File | No    |   0.334 (*)
Manager           | No    |   0.574

We can see from the methods that create a copy of the array that inheriting the array is the fastest. A limitation of this approach is that it requires the fork start method for child processes that is not available on all systems (e.g. windows)

Passing the array as a function argument is also fast, and a good alternative that will work on all platforms.

From the methods that do not copy the array, we can see that the memory mapped file is the fastest method, followed very closely by the SharedMemory method. Either approach provides an effective method for sharing an array among processes that allows read and write access on the data. A muex may also be needed to avoid possible race conditions.

Takeaways

You now know the speed of different methods to share a NumPy array between processes in Python.

If you enjoyed this tutorial, you will love my book: Concurrent NumPy in Python. It covers everything you need to master the topic with hands-on examples and clear explanations.