Last Updated on September 12, 2022
Python provides two pools of process-based workers via the multiprocessing.pool.Pool class and the concurrent.futures.ProcessPoolExecutor class.
In this tutorial you will discover the similarities and differences between the multiprocessing.pool.Pool and ProcessPoolExecutor. This will help you decide which to use in your Python projects for process-based concurrency.
Let’s get started.
What is multiprocessing.Pool
The multiprocessing.pool.Pool class provides a process pool in Python.
Note, you can access the process pool class via the helpful alias multiprocessing.Pool.
It allows tasks to be submitted as functions to the process pool to be executed concurrently.
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
— MULTIPROCESSING — PROCESS-BASED PARALLELISM
A process pool is a programming pattern for automatically managing a pool of worker processes.
The pool can provide a generic interface for executing ad hoc tasks with a variable number of arguments, much like the target property on the Process object, but does not require that we choose a process to run the task, start the process, or wait for the task to complete.
To use the process pool, we must first create and configure an instance of the class.
For example:
1 2 3 |
... # create a process pool pool = multiprocessing.pool.Pool(...) |
By default, the process pool will have one worker process for each logical CPU core in your system.
We can specify the number of workers to create via an argument to the class constructor.
For example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.pool.Pool(4) |
Tasks are issued in the process pool by specifying a function to execute that may or may not have arguments and may or may not return a value.
We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map().
For example:
1 2 3 4 |
... # issues tasks for execution for result in pool.map(task, items): # ... |
Tasks with multiple arguments are issued synchronously to the process pool using the starmap() function.
We can also issue tasks asynchronously to the process pool and receive a multiprocessing.AsyncResult in return, that provides a handle on the issued task or tasks.
Tasks can be issued asynchronously using the apply_async(), map_async(), and starmap_async().
For example:
1 2 3 4 |
... # issues tasks for execution asynchronously result = pool.map_async(task, items) # ... |
Once we have finished with the process pool, it can be closed and resources used by the pool can be released.
For example:
1 2 3 |
... # close the process pool pool.close() |
You can learn more about the process pool in the tutorial:
Now that we are familiar with multiprocessing.Pool, let’s take a look at ProcessPoolExecutor.
Run loops using all CPUs, download your FREE book to learn how.
What is ProcessPoolExecutor
The concurrent.futures.ProcessPoolExecutor class provides a process pool in Python.
A process is an instance of a computer program. A process has a main thread of execution and may have additional threads. A process may also spawn or fork child processes. In Python, like many modern programming languages, processes are created and managed by the underlying operating system.
You can create a process pool by instantiating the class and specifying the number of processes via the max_workers argument; for example:
1 2 3 |
... # create a process pool executor = ProcessPoolExecutor(max_workers=10) |
You can then submit tasks to be executed by the process pool using the map() and the submit() functions.
The map() function matches the built-in map() function and takes a function name and an iterable of items. The target function will then be called for each item in the iterable as a separate task in the process pool. An iterable of results will be returned if the target function returns a value.
The call to map() does not block, but each result yielded in the returned iterator will block until the associated task is completed.
For example:
1 2 3 4 |
... # call a function on each item in a list and process results for result in executor.map(task, items): # process result... |
You can also issue tasks to the pool via the submit() function that takes the target function name and any arguments and returns a Future object.
The Future object can be used to query the status of the task (e.g. done(), running(), or cancelled()) and can be used to get the result or exception raised by the task once completed. The calls to result() and exception() will block until the task associated with the Future is done.
For example:
1 2 3 4 5 |
... # submit a task to the pool and get a future immediately future = executor.submit(task, item) # get the result once the task is done result = future.result() |
Once you are finished with the process pool, it can be shut down by calling the shutdown() function in order to release all of the worker processes and their resources.
For example:
1 2 3 |
... # shutdown the process pool executor.shutdown() |
The process of creating and shutting down the process pool can be simplified by using the context manager that will automatically call the shutdown() function.
For example:
1 2 3 4 5 6 7 8 |
... # create a process pool with ProcessPoolExecutor(max_workers=10) as executor: # call a function on each item in a list and process results for result in executor.map(task, items): # process result... # ... # shutdown is called automatically |
For more on the ProcessPoolExecutor, see the guide:
Now that we are familiar with the multiprocessing.Pool and ProcessPoolExecutor, let’s compare and contrast each.
Comparison of Pool vs ProcessPoolExecutor
Now that we are familiar with the multiprocessing.Pool and ProcessPoolExecutor classes, let’s review their similarities and differences.
Similarities
The multiprocessing.Pool and ProcessPoolExecutor classes are very similar. They are both process pools of child worker processes.
The most important similarities are as follows:
- Both Use Processes
- Both Can Run Ad Hoc Tasks
- Both Support Asynchronous Tasks
- Both Can Wait For All Tasks
- Both Have Thread-Based Equivalents
Let’s take a closer look at each in turn.
1. Both Use Processes
Both the multiprocessing.Pool and ProcessPoolExecutor create and use child worker processes.
These are real native or system-level child processes that may be forked or spawned. This means, they are created and managed by the underlying operating system.
As such, the worker child processes used in each class offer true parallelism via process-based concurrency.
This means tasks issued to each process pool will execute concurrently and make best use of available CPU cores.
It also means, tasks issued to each process pool will be subject to inter-process communication, requiring that data sent to child processes and received from child processes be pickled, adding computational overhead.
2. Both Can Run Ad Hoc Tasks
Both the multiprocessing.Pool and ProcessPoolExecutor may be used to execute ad hoc tasks defined by custom functions.
The multiprocessing.Pool can issue one-off tasks using the apply() and apply_async() function, and may issue multiple tasks that use the same function with different arguments with the map(), imap(), imap_unordered(), and starmap() functions and their asynchronous equivalents map_async() and starmap_async().
The ProcessPoolExecutor can issue one-off tasks via the submit() function, and may issue multiple tasks that use the same function with different arguments via the map() function.
3. Both Support Asynchronous Tasks
Both the multiprocessing.Pool and ProcessPoolExecutor can be used to issue tasks asynchronously.
Recall that issuing tasks asynchronously means that the main process can issue a task without blocking. The function call will return immediately with some handle on the issued task and allow the main process to continue on with the program.
The multiprocessing.Pool supports issuing tasks asynchronously via the apply_async(), map_async() and starmap_async() functions that return an AsyncResult object that provides a handle on the issued tasks.
The ProcessPoolExecutor provides the submit() function for issuing tasks asynchronously that returns a Future object that provides a handle on the issued task.
Additionally, both process pools provide helpful mechanisms for working with asynchronous tasks, such as checking their status, getting their results and adding callback functions.
4. Both Can Wait For All Tasks
Both the multiprocessing.Pool and ProcessPoolExecutor provide the ability to wait for tasks that were issued asynchronously.
The multiprocessing.Pool provides a wait() function on the AsyncResult object returned as a handle on asynchronous tasks. It also allows the pool to be shutdown and joined, which will not return until all issued tasks have completed.
The ProcessPoolExecutor provides the wait() module function that can take a collection of Future objects on which to wait. It also allows the process pool to be shutdown, which can be configured to block until all tasks in the pool have completed.
5. Both Have Thread-Based Equivalents
Both the multiprocessing.Pool and ProcessPoolExecutor process pools have thread-based equivalents.
The multiprocessing.Pool has the multiprocessing.pool.ThreadPool which provides the same API, except that it uses thread-based concurrency instead of process-based concurrency.
Similarly, the ProcessPoolExecutor has the concurrent.futures.ThreadPoolExecutor that provides the same API as the ProcessPoolExecutor (e.g. extends the same Executor base class) except that it is implemented using thread-based concurrency.
This is helpful as both process pools can be used and switch to use thread-based concurrency with very little change to the program code.
Differences
The multiprocessing.Pool and ProcessPoolExecutor are also subtly different.
The differences between these two process pools is focused on differences in APIs on the classes themselves.
Them main differences are as follows:
- Ability to Cancel Tasks
- Operations on Groups of Tasks
- Ability to Terminate All Tasks
- Asynchronous Map Functions
- Ability to Access Exception
Let’s take a closer look at each in turn.
1. Ability to Cancel Tasks
Tasks issued to the ProcessPoolExecutor can be canceled, whereas tasks issued to the multiprocessing.Pool cannot.
The ProcessPoolExecutor provides the ability to cancel tasks that have been issued to the process pool but have not yet started executing.
This is provided via the cancel() function on the Future object returned from issuing a task via submit().
The multiprocessing.Pool does not provide this capability.
2. Operations on Groups of Tasks
The ProcessPoolExecutor provides tools to work with groups of asynchronous tasks, whereas the multiprocessing.Pool does not.
The concurrent.futures module provides the wait() and as_completed() module functions. These functions are designed to work with collections of Future objects returned when issuing tasks asynchronously to the process pool via the submit() function.
They allow the caller to wait for an event on a collection of heterogeneous tasks in the process pool, such as for all tasks to complete, for the first task to complete, or for the first task to fail.
They also allow the caller to process the results from a collection of heterogeneous tasks in the order that the tasks are completed, rather than the order the tasks were issued.
The multiprocessing.Pool does not provide this capability.
3. Ability to Terminate All Tasks
The multiprocessing.Pool provides the ability to forcefully terminate all tasks, whereas the ProcessPoolExecutor does not.
The multiprocessing.Pool class provides the close() and terminate() functions that will send the SIGTERM and SIGKILL signals to the child worker processes.
These signals will cause the child worker processes to stop, even if they are in the middle of executing tasks, which could leave program state in an inconsistent state.
Nevertheless, the ProcessPoolExecutor does not provide this capability.
4. Asynchronous Map Functions
The multiprocessing.Pool provides a focus on map() based concurrency, whereas the ProcessPoolExecutor does not.
That ProcessPoolExecutor does provide a parallel version of the built-in map() function which will apply the same function to an iterable of arguments. Each function call is issued as a separate task to the process pool.
The multiprocessing.Pool provides three versions of the built-in map() function for applying the same function to an iterable of arguments in parallel as tasks in the process pool.
They are: the map(), a lazier version of map() called imap(), and a version of map() that takes multiple arguments for each function call called starmap().
It also provides a version imap() where the iterable of results has return values in the order that tasks complete rather than the order that tasks are issued called imap_unordered().
Finally, it has asynchronous versions of the map() function called map_async() and of the starmap() function called starmap_async().
In all, the multiprocessing.Pool provides 6 parallel versions of the built-in map() function.
5. Ability to Access Exception
The ProcessPoolExecutor provides a way to access an exception raised in an asynchronous task directly, whereas the multiprocessing.Pool does not.
Both process pools provide the ability to check if a task was successful or not, and will re-raise an exception when getting the task result, if an exception was raised and not handled in the task.
Nevertheless, only the ProcessPoolExecutor provides the ability to directly get an exception raised in a task.
A task issued into the ProcessPoolExecutor asynchronously via the submit() function will return a Future object. The exception() function on the Future object allows the caller to check if an exception was raised in the task, and if so, to access it directly.
The multiprocessing.Pool does not provide this ability.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
Summary of Differences
It may help to summarize the differences between multiprocessing.Pool and ProcessPoolExecutor.
multiprocessing.Pool
- Does not provide the ability to cancel tasks, whereas the ProcessPoolExecutor does.
- Does not provide the ability to work with collections of heterogeneous tasks, whereas the ProcessPoolExecutor does.
- Provides the ability to forcefully terminate all tasks, whereas the ProcessPoolExecutor does not.
- Provides a focus on parallel versions of the map() function, whereas the ProcessPoolExecutor does not.
- Does not provide the ability to access an exception raised in a task, whereas the ProcessPoolExecutor does.
ProcessPoolExecutor
- Provides the ability to cancel tasks, whereas the multiprocessing.Pool does not.
- Provides the ability to work with collections of heterogeneous tasks, whereas the multiprocessing.Pool does not.
- Does not provide the ability to forcefully terminate all tasks, whereas the multiprocessing.Pool does.
- Does not provide multiple parallel versions of the map() function, whereas the multiprocessing.Pool does.
- Provides the ability to access an exception raised in a task, whereas the multiprocessing.Pool does not.
The figure below provides a helpful side-by-side comparison of the key differences between multiprocessing.Pool and ProcessPoolExecutor.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know the difference between multiprocessing.Pool and ProcessPoolExecutor and when to use each.
Do you have any questions about the difference between multiprocessing.Pool and ProcessPoolExecutor in Python?
Ask your questions in the comments below and I will do my best to answer.
Photo by Steve Donoghue on Unsplash
Do you have any questions?