Last Updated on September 12, 2022
You can read the PEP for the multiprocessing module and Python release changelogs in order to learn the history of the multiprocessing pool.
In this tutorial you will discover the history of the multiprocessing pool in Python.
Let’s get started.
Multiprocessing Pool Authors
The multiprocessing pool was developed by Jesse Noller and Richard Oudkerk.
Specifically, Jesse and Richard proposed and developed the “multiprocessing” module which was added to the Python standard library as PEP 317.
The multiprocessing pool was added as part of this package, right from the beginning.
Run loops using all CPUs, download your FREE book to learn how.
PEP 371 on Multiprocessing
PEP is an acronym which stands for Python Enhancement Proposal.
It refers to the process used for proposing new features to add to Python, which can then be reviewed by the community.
PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment. The PEP should provide a concise technical specification of the feature and a rationale for the feature.
— PEP 1 – PEP Purpose and Guidelines
In fact, there is a PEP document for the PEP process itself:
The multiprocessing package was proposed in PEP 371, specifically:
It can be interesting to read this PEP in order to understand why the multiprocessing module was developed.
For example, the “multiprocessing” was originally called “pyProcessing“.
This PEP proposes the inclusion of the pyProcessing [1] package into the Python standard library, renamed to “multiprocessing”.
— PEP 371 – Addition of the multiprocessing package to the standard library
Thank goodness it was changed.
The pyProcessing name comes from a port of Java’s Processing project for visual arts to Python, referred to as Processing.py developed by Jesse.
Jesse and Richard proposed to copy the “threading” module, which already existed at the time, but to use process-based concurrency in an effort to circumvent the limitations of the Global Interpreter Lock (GIL).
The processing package mimics the standard library threading module functionality to provide a process-based approach to threaded programming allowing end-users to dispatch multiple tasks that effectively side-step the global interpreter lock.
— PEP 371 – Addition of the multiprocessing package to the standard library
The multiprocessing.Pool was not mentioned in the PEP specifically, which is reasonable as the PEP documents are not intended to be too technical or too specific when it comes to APIs.
Nevertheless, it was mentioned in an announcement on the standard library developers mail list.
A
— [stdlib-sig] Processing module inclusion into the stdlib proposalPool
class makes it easy to submit tasks to a pool of worker processes.
Version of Python that Introduced the Pool
The proposal and addition of multiprocessing occurred in early to mid 2008.
The multiprocessing module was added to the Python library as part of Python 2.6, released in late 2008.
The new multiprocessing package lets Python programs create new processes that will perform a computation and return a result to the parent. The parent and child processes can communicate using queues and pipes, synchronize their operations using locks and semaphores, and can share simple arrays of data.
— What’s New in Python 2.6
The multiprocessing.Pool class was provided as part of the API from the beginning of the multiprocessing module.
The Pool class represents a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways.
— multiprocessing — Process-based “threading” interface
Python 3.0 (Python 3000) included features from Python 2.6, including the multiprocessing module.
As such, multiprocessing has been a part of Python 3 since the beginning.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
History of the Multiprocessing Pool API
The initial multiprocessing.Pool API was nearly identical to the API today.
It included functions such as apply() and map() as well as the asynchronous versions of the functions apply_async() and map_async().
Pool will create a fixed number of worker processes, and requests can then be distributed to the workers by calling apply() or apply_async() to add a single request, and map() or map_async() to add a number of requests.
— What’s New in Python 2.6
In Python 2.7 (and Python 3.2), the “maxtasksperchild” parameter to the multiprocessing.Pool was added by Charles Cazabon.
This allowed the number of tasks executed by each worker to be limited to a fixed amount before the child worker process was replaced.
The Pool class, which controls a pool of worker processes, now has an optional maxtasksperchild parameter. Worker processes will perform the specified number of tasks and then exit, causing the Pool to start a new worker. This is useful if tasks may leak memory or other resources, or if some tasks will cause the worker to become very large. (Contributed by Charles Cazabon; bpo-6963.)
— What’s New in Python 2.7
The concept of a multiprocessing context was added in Python 3.4, and support for providing an alternate context to the multiprocessing.Pool context was added in this version.
This was added by one of the multiprocessing modules authors, Richard Oudkerk.
multiprocessing also now has the concept of a context, which determines how child processes are created. New function get_context() returns a context that uses a specified start method. It has the same API as the multiprocessing module itself, so you can use it to create Pools and other objects that will operate within that context. This allows a framework and an application or different parts of the same application to use multiprocessing without interfering with each other. (Contributed by Richard Oudkerk in bpo-18999.)
— What’s New In Python 3.4
Two key features were missing from the initial version of the API, they were:
- Context manager interface for the Pool class.
- The starmap() and starmap_async() functions.
These were both added in Python 3.3.
The starmap() and starmap_async() functions were contributed by Hynek Schlawack in bpo-12708.
New methods multiprocessing.pool.Pool.starmap() and starmap_async() provide itertools.starmap() equivalents to the existing multiprocessing.pool.Pool.map() and map_async() functions. (Contributed by Hynek Schlawack in bpo-12708.)
— What’s New In Python 3.3
The change logs for many major Python releases list bug fixes in the Pool, with less notable changes to the API.
It can be helpful to review these to understand specific behavior in the API.
This page provides a handy search feature for these more minor changes back to Python 3.5:
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know the history of the multiprocessing pool class in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Sacha Verheij on Unsplash
Do you have any questions?