Last Updated on September 12, 2022
You can share a multiprocessing pool with child workers indirectly using a multiprocessing.Manager and proxy objects.
Using a Manager provides a workaround for the inability to directly share a process pool because it cannot be pickled.
This allows you to issue tasks to the pool from tasks executed by the pool, e.g. issue tasks from tasks.
In this tutorial you will discover how to share a process pool with child worker processes in Python.
Let’s get started.
Need To Share Process Pool With Tasks
The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.
A process pool can be configured when it is created, which will prepare the child workers.
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
— multiprocessing — Process-based parallelism
We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map().
Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().
When using the process pool we may want to pass the pool to a task executed by the pool.
This may be for many reasons, such as:
- The task may want to log or report the status of the pool.
- The task may want to log or report the status of a task in the pool.
- The task may want to issue follow-up tasks to the pool.
How can we pass the process pool to a worker child process?
Run loops using all CPUs, download your FREE book to learn how.
Sharing the Multiprocessing Pool Results in an Error
The multiprocessing.pool.Pool cannot be shared with child processes directly.
This is because the process pool cannot be serialized.
Specifically, it cannot be pickled, Python’s native serialization method, and will result in a NotImplementedError:
1 |
pool objects cannot be passed between processes or pickled |
We can see this in the source code for the multiprocessing.pool.Pool class in the __reduce__() method, called when an object is pickled
1 2 |
def __reduce__(self): raise NotImplementedError('pool objects cannot be passed between processes or pickled') |
When we share an object with another process, like a child process, the object must be pickled in the current process, transmitted to the other process, then unpickled.
Because the process pool cannot be pickled, it cannot be shared with another process.
Sharing a process pool may mean a few things.
For example:
- Putting a process pool on a multiprocessing queue or a pipe.
- Passing a process pool as an argument in a function executed by another process.
- Returning a process pool from one process to another.
This has consequences in your development of programs that use process-based concurrency.
Such as:
- You cannot share the process pool with a child process.
- A child process cannot share its process pool with a parent process.
- You cannot share a process pool with child workers in the process pool itself.
Now that we know that we cannot share a process pool between processes, let’s look at a workaround.
How to Share a Multiprocessing Pool via a Manager
We can share a process pool indirectly.
This can be achieved by creating a process pool using a manager. This will return a proxy object for the process pool that can be shared among processes directly.
A multiprocessing.Manager provides a way to create a centralized version of a Python object hosted on a server process.
Once created, it returns proxy objects that allow other processes to interact with the centralized objects automatically behind the scenes.
Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.
— multiprocessing — Process-based parallelism
The multiprocessing.Manager provides the full multiprocessing API, allowing concurrency primitives to be shared among processes, including the process pool.
As such, using a multiprocessing.Manager is a useful way to centralize a synchronization primitive like a multiprocessing.pool.Pool shared among multiple processes, such as worker processes in the pool itself.
We can first create a multiprocessing.Manager using the context manager interface.
For example:
1 2 3 4 |
... # create the manager with Manager() as manager: # ... |
We can then create a shared multiprocessing.pool.Pool object using the manager.
This will return a proxy object for the multiprocessing.pool.Pool object in the manager process that we can share among child worker processes directly or indirectly.
For example:
1 2 3 |
... # create a shared object via the manager pool = manager.Pool() |
The proxy for the multiprocessing.pool.Pool can then be passed to a child worker initialization function or to a task function as an argument to be executed by worker processes.
Now that we know how to share a process pool indirectly with other processes, let’s look at some worked examples.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
Example Error Sharing the Multiprocessing Pool With Workers
Before we explore an example of sharing a process pool with child worker processes, let’s explore an example where sharing the process pool directly results in an error.
In this example, we will define a custom task function that takes a process pool as an argument and reports its details. We will then issue the task from the main process asynchronously and pass the process pool as an argument. This will fail with an error. We will add an error handler callback to ensure the details of the error are reported.
Firstly, we can define the error callback back function.
This function takes an error raised in a task in the process pool and reports its details directly.
The handler() function below implements this.
1 2 3 |
# error callback function def handler(error): print(error, flush=True) |
Next, we can define the custom task function.
The function takes the process pool as an argument and reports its details.
The task() function below implements this.
1 2 3 4 |
# task executed in a worker process def task(pool): # report a message print(f'Pool Details: {pool}', flush=True) |
Next, in the main process we can create the process pool.
We will create the process pool with the default configuration and use the context manager interface.
1 2 3 4 |
... # create and configure the process pool with Pool() as pool: # ... |
You can learn more about the context manager interface in the tutorial:
We can then issue a call to our custom task() function asynchronously to the process pool using the apply_async() function.
We will pass the pool instance itself as an argument and specify an error handler callback function as our custom handler() function.
1 2 3 |
... # issue a task to the process pool pool.apply_async(task, args=(pool,), error_callback=handler) |
You can learn more about the apply_async() function in the tutorial:
Finally, the main process will close the process pool and block, waiting for the issued task to complete.
1 2 3 4 5 |
... # close the pool pool.close() # wait for all issued tasks to complete pool.join() |
You can learn more about joining the process pool in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# SuperFastPython.com # example of attempting to share the pool with a worker process from multiprocessing.pool import Pool # error callback function def handler(error): print(error, flush=True) # task executed in a worker process def task(pool): # report a message print(f'Pool Details: {pool}', flush=True) # protect the entry point if __name__ == '__main__': # create and configure the process pool with Pool() as pool: # issue a task to the process pool pool.apply_async(task, args=(pool,), error_callback=handler) # close the pool pool.close() # wait for all issued tasks to complete pool.join() |
Running the example first creates the process pool.
It then issues the task asynchronously. The main process then closes the process pool and waits for all issued tasks to complete.
The task begins executing and fails with an error.
The error is passed to the error callback function and is reported.
This highlights that we cannot pass a process pool instance directly to child worker processes in the process pool.
pool objects cannot be passed between processes or pickled
Next, let’s look at an example where we can use a manager to share a process pool with child workers.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Sharing the Multiprocessing Pool With Workers
We can explore how we can share a process pool with child processes using a manager.
In this example, we will update the previous example so that the process pool is created using a multiprocessing.Manager. This will create a centralized version of the process pool running in a server process and return proxy objects for the process pool that we can share among child processes.
Firstly, we must create the manager instance.
This can be achieved using the context manager interface that will ensure the manager is closed once we are finished using it.
1 2 3 4 |
... # create a manager with Manager() as manager: # ... |
Next, we can create the process pool via the manager.
This will create the process pool in the server process and return a proxy object that works just like the actual process pool object.
1 2 3 4 |
... # create and configure the process pool with manager.Pool() as pool: # ... |
And that’s it.
The proxy for the process pool object can then be passed to the child worker processes directly and provide access to the centralized process pool.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of sharing a process pool among processes from multiprocessing import Manager # error callback function def handler(error): print(error, flush=True) # task executed in a worker process def task(pool): # report a message print(f'Pool Details: {pool}') # protect the entry point if __name__ == '__main__': # create a manager with Manager() as manager: # create and configure the process pool with manager.Pool() as pool: # issue a task to the process pool pool.apply_async(task, args=(pool,), error_callback=handler) # close the pool pool.close() # wait for all issued tasks to complete pool.join() |
Running the example first creates the process pool.
It then issues the task asynchronously. The main process then closes the process pool and waits for all issued tasks to complete.
The task executes normally and reports the details of the process pool.
In this case, the pool is accessed via the proxy object without problem and the details of the pool are reported. It is closed and was created with 8 child worker processes.
Note, the specific default configuration of the process pool on your system may differ.
The task completes and the main process carries on, first terminating the process pool, then terminating the manager.
This highlights how we can share a process pool with child worker processes indirectly using a manager.
1 |
Pool Details: <multiprocessing.pool.Pool state=CLOSE pool_size=8> |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know how to share a process pool with child worker processes in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Justin Heap on Unsplash
Do you have any questions?