How to Share Numpy Array Using SharedMemory

June 13, 2023 Concurrent NumPy

You can share a numpy array between processes by using multiprocessing SharedMemory.

In this tutorial, you will discover how to share a numpy array between processes using multiprocessing SharedMemory.

Let's get started.

Need to Share Numpy Array Between Processes

Python offers process-based concurrency via the multiprocessing module.

Process-based concurrency is appropriate for those tasks that are CPU-bound, as opposed to thread-based concurrency in Python which is generally suited to IO-bound tasks given the presence of the Global Interpreter Lock (GIL).

You can learn more about process-based concurrency and the multiprocessing module in the tutorial:

Consider the situation where we need to share numpy arrays between processes.

This may be for many reasons, such as:

Sharing Python objects and data between processes is slow.

This is because any data, like numpy arrays, shared between processes must be transmitted using inter-process communication (ICP) requiring the data first be pickled by the sender and then unpickled by the receiver.

You can learn more about this in the tutorial:

This means that if we share numpy arrays between processes, it assumes that we receive some benefit, such as a speedup, that overcomes the slow speed of data transmission.

For example, it may be the case that the arrays are relatively small and fast to transmit, whereas the computation performed on each array is slow and can benefit from being performed in separate processes.

Alternatively, preparing the array may be computationally expensive and benefit from being performed in a separate process, and once prepared, the arrays are small and fast to transmit to another process that requires them.

Given these situations, how can we share data between Processes in Python?

How to Share Numpy Array Between Processes Using SharedMemory

One approach to sharing numpy arrays between processes is to use shared memory.

Python processes do not have shared memory. Instead, processes must simulate shared memory.

Since Python 3.8 a new approach to shared memory was added to Python in the multiprocessing.shared_memory module.

The benefit of shared memory is speed and efficiency.

No inter-process communication is required. Instead, processes are able to read and write the same shared memory block directly, although within constraints.

The module provides three classes to support shared memory, they are:

The SharedMemory class is the core class for providing shared memory.

It allows a shared memory block to be created, named, and attached to. It provides a "buf" attribute to read and write data in an array-like structure and can be closed and destroyed.

You can learn more about shared memory in the tutorial:

A SharedMemory can be created in a process by calling the constructor and specifying a "size" in bytes and the "create" argument to True.

For example:

...
# create a shared memory
shared_mem = SharedMemory(size=1024, create=True)

A shared memory object can be assigned a meaningful name via the "name" attribute to the constructor.

For example:

...
# create a shared memory with a name
shared_mem = SharedMemory(name='MyMemory', size=1024, create=True)

Another process can access a shared memory via its name. This is called attaching to a shared memory.

This can be achieved by specifying the name of the shared memory that has already been created and setting the "create" argument to False (the default).

For example:

...
# attach to a shared memory
shared_mem = SharedMemory(name='MyMemory', create=False)

Once created data can be stored in the shared memory via the "buf" attribute that acts like an array of bytes.

For example:

...
# write data to shared memory
shared_mem.buf[0] = 1

You can learn more about creating and using the SharedMemory class in the tutorial:

We can create a shared memory and use it as the basis for a numpy array.

This means that multiple processes can interact with the same numpy array directly via the shared memory, rather than passing copies of the array around. This is significantly faster and mimics shared memory available between threads.

A new numpy.ndarray instance can be created and we can configure it to use the shared memory as the buffer via the "buffer" argument.

For example:

...
# create a new numpy array that uses the shared memory
data = ndarray((n,), dtype=numpy.double, buffer=shared_mem.buf)

We must ensure that the SharedMemory has a "size" that is large enough for our numpy array.

For example, if we create a numpy array to hold double values, each double is 8 bytes. Therefore, we can set the size of the SharedMemory to be 8 multiplied by the size of the array we need.

For example:

...
# define the size of the numpy array
n = 10000
# bytes required for the array (8 bytes per value for doubles)
n_bytes = n * 8
# create the shared memory
shared_mem = SharedMemory(name='MyMemory', create=True, size=n_bytes)

Now that we know how to share a numpy array using multiprocessing SharedMemory, let's look at a worked example.

Example of Sharing a Numpy Array Using SharedMemory

We can explore an example of sharing a numpy array between processes using shared memory.

In this example, we will create a shared memory large enough to hold our array, then create an array backed by the shared memory. We will then start a child process and have it connect to the same shared memory and create an array backed by the shared memory and access the same data.

This example will highlight how multiple processes can interact with the same numpy array in memory directly.

Firstly, we will define a task to execute in the child process.

The task will not take any arguments. It will first connect to the shared memory, then create a new numpy array of double values backed by the shared memory.

...
# define the size of the numpy array
n = 10000
# attach another shared memory block
sm = SharedMemory('MyMemory')
# create a new numpy array that uses the shared memory
data = ndarray((n,), dtype=numpy.double, buffer=sm.buf)

Next, the task will report the first 10 values of the array, increment all data in the array and then confirm that the data in the array was changed.

...
# check the contents
print(f'Child {data[:10]}')
# increment the data
data[:] += 1
# confirm change
print(f'Child {data[:10]}')

Finally, the task will close its access to the shared memory.

...
# close the shared memory
sm.close()

The task() function below implements this.

# task executed in a child process
def task():
    # define the size of the numpy array
    n = 10000
    # attach another shared memory block
    sm = SharedMemory('MyMemory')
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=numpy.double, buffer=sm.buf)
    # check the contents
    print(f'Child {data[:10]}')
    # increment the data
    data[:] += 1
    # confirm change
    print(f'Child {data[:10]}')
    # close the shared memory
    sm.close()

Next, the main process will create the shared memory with the required size.

Given that we know we want a numpy array of a given size (10,000 elements) containing double values, we can calculate the size in memory required directly as 8 bytes (per double value) multiplied by the size of the array.

...
# define the size of the numpy array
n = 10000
# bytes required for the array (8 bytes per value for doubles)
n_bytes = n * 8
# create the shared memory
sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)

Double values may not always be 8 bytes, it is platform dependent.

You can learn more about numpy data types here:

Note, we can create a shared memory with more space than that required by our array, but not less space. Ensure you allocate enough memory for your array and do some basic checks and calculations first if you're not sure.

If the buffer is too small to hold the array, you will get an error such as:

TypeError: buffer is too small for requested array

Next, the main process will create a new numpy array with the given size, containing double values, and backed by our shared memory with a specific size.

...
# create a new numpy array that uses the shared memory
data = ndarray((n,), dtype=numpy.double, buffer=sm.buf)

We will then fill the array with one value and then confirm the data in the array was changed and that the array has the expected dimensions, e.g. 10k elements.

...
# populate the array
data.fill(1.0)
# confirm contents of the new array
print(data[:10], len(data))

Next, we will create, configure and start a child process to execute our task() function and block until it is terminated.

...
# create a child process
child = Process(target=task)
# start the child process
child.start()
# wait for the child process to complete
child.join()

If you are new to starting a child process to execute a target function, you can learn more in the tutorial:

Finally, we will confirm the contents of the shared array were changed by the child process, then close access to the shared memory and release the shared memory.

...
# check some data in the shared array
print(data[:10])
# close the shared memory
sm.close()
# release the shared memory
sm.unlink()

Tying this together, the complete example is listed below.

# share numpy array via a shared memory
from multiprocessing import Process
from multiprocessing.shared_memory import SharedMemory
from numpy import ones
from numpy import ndarray
import numpy

# task executed in a child process
def task():
    # define the size of the numpy array
    n = 10000
    # attach another shared memory block
    sm = SharedMemory('MyMemory')
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=numpy.double, buffer=sm.buf)
    # check the contents
    print(f'Child {data[:10]}')
    # increment the data
    data[:] += 1
    # confirm change
    print(f'Child {data[:10]}')
    # close the shared memory
    sm.close()

# protect the entry point
if __name__ == '__main__':
    # define the size of the numpy array
    n = 10000
    # bytes required for the array (8 bytes per value for doubles)
    n_bytes = n * 8
    # create the shared memory
    sm = SharedMemory(name='MyMemory', create=True, size=n_bytes)
    # create a new numpy array that uses the shared memory
    data = ndarray((n,), dtype=numpy.double, buffer=sm.buf)
    # populate the array
    data.fill(1.0)
    # confirm contents of the new array
    print(data[:10], len(data))
    # create a child process
    child = Process(target=task)
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()
    # check some data in the shared array
    print(data[:10])
    # close the shared memory
    sm.close()
    # release the shared memory
    sm.unlink()

Running the example first creates the shared memory with the name "MyMemory" and 80,000 bytes in size (e.g. 10k * 8 bytes per double).

A new numpy array is then created with the given size and backed by the shared memory.

The array is filled with one values and the contents and size of the array are confirmed.

A child process is then created and configured to execute our task() function. The process is started and the main process blocks until the child process terminates.

The child process runs and connects to the shared memory by name, e.g. "MyMemory". It then creates a new numpy array with the same size and data type as the parent process and backs it with the shared memory.

The child process first confirms that the array contains one values, as set by the parent process. This confirms that the child process is operating with the same memory (same array) as the parent process.

It increments all values in the array and confirms that the data was changed.

The child process closes its access to the shared memory and terminates.

The main process resumes and confirms that the contents of the array were changed by the child process. This confirms that changes made to the array by the child process are reflected in the parent process.

Finally, the main process closes access to the shared memory and then releases the shared memory.

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 10000
Child [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Child [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]

Simplified Example of Sharing a Numpy Array Using SharedMemory

We can simplify the example a little and leave it functionally the same.

For example, we can define a function that creates a numpy array backed by a shared memory.

The function can take the shared memory object, the shape of the array, and the data type, then return a numpy backed by the shared memory.

For example:

# define a numpy array backed by shared memory
def sm_array(shared_mem, shape, dtype=numpy.double):
    # create a new numpy array that uses the shared memory
    return ndarray(shape, dtype=dtype, buffer=shared_mem.buf)

This function would ensure that the same numpy array is created each time, regardless of the process using the array.

We can also pass the size of the array and the name of the shared memory to the child process as arguments, to ensure the parent and child processes are consistent.

For example:

# task executed in a child process
def task(n, name):
    # attach another shared memory block
    sm = SharedMemory(name)
    # create a new numpy array that uses the shared memory
    data = sm_array(sm, (n,))
    # check the contents
    print(f'Child {data[:10]}')
    # increment the data
    data[:] += 1
    # confirm change
    print(f'Child {data[:10]}')
    # close the shared memory
    sm.close()

Finally, we can create the SharedMemory in the main process using the SharedMemoryManager and the context manager interface. This ensures that the memory is always closed and unlinked by the program, even if there are errors.

For example:

...
# define the size of the numpy array
n = 10000000
# bytes required for the array (8 bytes per value for doubles)
n_bytes = n * 8
with SharedMemoryManager() as smm:
    # create the shared memory
    sm = smm.SharedMemory(size=n_bytes)
    # create a new numpy array that uses the shared memory
    data = sm_array(sm, (n,))
    # populate the array
    data.fill(1.0)
    # confirm contents of the new array
    print(data[:10], len(data))
    # create a child process
    child = Process(target=task, args=(n, sm.name))
    # start the child process
    child.start()
    # wait for the child process to complete
    child.join()
    # check some data in the shared array
    print(data[:10])

You can learn more about the SharedMemoryManager in the tutorial:

Tying this together, the complete example with these changes is listed below.

# share numpy array via a shared memory
from multiprocessing import Process
from multiprocessing.shared_memory import SharedMemory
from multiprocessing.managers import SharedMemoryManager
from numpy import ones
from numpy import ndarray
import numpy

# define a numpy array backed by shared memory
def sm_array(shared_mem, shape, dtype=numpy.double):
    # create a new numpy array that uses the shared memory
    return ndarray(shape, dtype=dtype, buffer=shared_mem.buf)

# task executed in a child process
def task(n, name):
    # attach another shared memory block
    sm = SharedMemory(name)
    # create a new numpy array that uses the shared memory
    data = sm_array(sm, (n,))
    # check the contents
    print(f'Child {data[:10]}')
    # increment the data
    data[:] += 1
    # confirm change
    print(f'Child {data[:10]}')
    # close the shared memory
    sm.close()

# protect the entry point
if __name__ == '__main__':
    # define the size of the numpy array
    n = 10000000
    # bytes required for the array (8 bytes per value for doubles)
    n_bytes = n * 8
    with SharedMemoryManager() as smm:
        # create the shared memory
        sm = smm.SharedMemory(size=n_bytes)
        # create a new numpy array that uses the shared memory
        data = sm_array(sm, (n,))
        # populate the array
        data.fill(1.0)
        # confirm contents of the new array
        print(data[:10], len(data))
        # create a child process
        child = Process(target=task, args=(n, sm.name))
        # start the child process
        child.start()
        # wait for the child process to complete
        child.join()
        # check some data in the shared array
        print(data[:10])

Running the example is functionally the same as the first version and produces the same result.

It offers a little more safety by ensuring the creation of the array is consistent across both processes.

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 10000000
Child [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Child [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]

Takeaways

You now know how to share a numpy array between processes using multiprocessing SharedMemory.



If you enjoyed this tutorial, you will love my book: Concurrent NumPy in Python. It covers everything you need to master the topic with hands-on examples and clear explanations.