Effective use of the ProcessPoolExecutor in Python requires some knowledge of how it works internally.
In this tutorial you will discover how the ProcessPoolExecutor works so that you can use it better in your projects.
Let’s get started.
Table of Contents
How Does ProcessPoolExecutor Work Internally
It is important to pause for a moment and look at how the ProcessPoolExecutor works internally.
The internal workings of the class impact how we use the process pool and the behavior we can expect, specifically around cancelling tasks.
Without this knowledge, some of the behavior of the process pool may appear confusing or even buggy from the outside.
You can see the source code for the ProcessPoolExecutor and the base class here:
There is a lot we could learn about how the process pool works internally, but we will limit ourselves to the most critical aspects.
Task Are Added To Internal Queues
The source code for the ProcessPoolExecutor provides a good summary for how the pool works internally.
There is an ASCII diagram in the source that we can reproduce here that will help:
|======================= In-process =====================|== Out-of-process ==|
+----------+ +----------+ +--------+ +-----------+ +---------+
| | => | Work Ids | | | | Call Q | | Process |
| | +----------+ | | +-----------+ | Pool |
| | | ... | | | | ... | +---------+
| | | 6 | => | | => | 5, call() | => | |
| | | 7 | | | | ... | | |
| Process | | ... | | Local | +-----------+ | Process |
| Pool | +----------+ | Worker | | #1..n |
| Executor | | Thread | | |
| | +----------- + | | +-----------+ | |
| | <=> | Work Items | <=> | | <= | Result Q | <= | |
| | +------------+ | | +-----------+ | |
| | | 6: call() | | | | ... | | |
| | | future | | | | 4, result | | |
| | | ... | | | | 3, except | | |
+----------+ +------------+ +--------+ +-----------+ +---------+
The data flow for submitting new tasks to the pool is as follows.
First we call submit() on the pool to add a new task. Note that calling map() internally will call the submit() function.
This will add a work item to an internal dictionary, specifically an instance of an internal class named _WorkItem. Each work item also has a unique identifier that is tracked in an internal Queue.
The process pool has an internal worker thread that is responsible for preparing work for the worker processes. It will wake-up when new work is added to the queue of work ids and will retrieve the new task and submit it to a second queue.
This second queue contains the tasks to be executed by worker processes, which are consumed by worker processes. Results of the target task functions are then placed in a result queue to be read by the worker thread and made available to any associated Future objects.
This decoupling between the queue that the user interacts with and the queue of tasks that the processes interact with is intentional and provides some measure of control and safety.
The important aspect about these internals from a user perspective is in cancelling tasks via their Future objects.
Tasks in the internal work queue can be canceled. Otherwise tasks cannot be cancelled, such as running tasks. Additionally, tasks in the internal call queue cannot be cancelled, but may not yet be running.
This means, we may query the status of a Future object and see that it is not running and is not done. Then attempt to cancel it, fail to cancel it, and see that the task begins running and is then done.
This happens because some number of tasks will sit in the call queue waiting to be consumed by the processes.
Specifically, the call queue will match the number of worker processes, plus one.
Worker Processes Are Created As Needed
Worker processes are not created when the process pool is created.
Instead, worker processes are created on demand or just-in-time.
Each time a task is added to the internal queues, the process pool will check if the number of active processes is less than the upper limit of processes supported by the pool. If so, an additional process is created to handle the new work.
Once a process has completed a task, it will wait on the call queue for new work to arrive. As new work arrives, all processes waiting on the queue will be notified and one will attempt to consume the unit of work and start executing it.
These two points show how the pool will only ever create new processes until the limit is reached and how processes will be reused, waiting for new tasks without consuming computational resources.
It also shows that the process pool will not release worker processes after a fixed number of units of work. Perhaps this would be a nice addition to the API in the future.
You now know how the ProcessPoolExecutor in Python works internally.
Do you have any questions?
Ask your question in the comments below and I will do my best to answer.