Configure Max Workers For The ProcessPoolExecutor
You can configure the number of workers in the ProcessPoolExecutor in Python by setting the "max_workers" argument.
In this tutorial you will discover how to configure the number of worker processes in Python process pools.
Let's get started.
Need to Configure The Number of Worker Processes
The ProcessPoolExecutor in Python provides a pool of reusable processes for executing ad hoc tasks.
You can submit tasks to the process pool by calling the submit() function and passing in the name of the function you wish to execute on another process. You can also submit tasks by calling the map() function and specify the name of the function to execute and the iterable of items to which your function will be applied.
The process pool has a fixed number of worker processes.
It is important to limit the number of worker processes in the process pools to perhaps the number of logical CPU cores or the number of physical CPU cores in your system, depending on the types of tasks you will be executing.
How do you configure the number of worker processes in the ProcessPoolExecutor?
How to Configure The Number of Workers
You can configure the number of worker processes in the ProcessPoolExecutor by setting the "max_workers" argument in the constructor.
For example:
...
# create a process pool and set the number of worker processes
executor = ProcessPoolExecutor(max_workers=4)
# ...
# shutdown the process pool
executor.shutdown()
The "max_workers" argument is the first argument in the constructor and does not need to be specified by name to be set, for example:
...
# create a process pool and set the number of worker processes
executor = ProcessPoolExecutor(4)
If you are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner.
For example:
...
# create a process pool using the context manager and set the number of workers
with ProcessPoolExecutor(4) as executor:
# ...
The argument takes a positive integer and defaults to the number of logical CPU cores in your system.
- Total Number Worker Processes = (CPUs in Your System)
For example, if you had 2 physical CPUs in your system and each CPU has hyperthreading (common in modern CPUs) then you would have 2 physical and 4 logical CPUs. Python would see 4 CPUs. The default number of worker processes on your system would then be 4.
The number of workers must be less than or equal to 61 if Windows is your operating system.
It is common to have more processes than CPUs (physical or logical) in your system, if the target task function is performing blocking IO operations.
The reason for this is because processes are used for IO-bound tasks, not CPU bound tasks. This means that processes are used for tasks that wait for relatively slow resources to respond, like hard drives, printers, and network connections, and much more.
If you require hundreds or processes for IO-bound tasks, you might want to consider using threads instead and the ThreadPoolExecutor. If you require thousands of processes for IO-bound tasks, you might want to consider using the AsyncIO module.
Now that we know how to configure the number of worker processes in the ProcessPoolExecutor, let's look at a worked example.
Example of Configuring The Number of Workers
Let's explore how to configure the number of worker processes with a worked example.
Check The Default Number of Worker Processes
First, let's check how many processes are created for process pools on your system.
Looking at the source code for the ProcessPoolExecutor we can see that the number of worker processes chosen by default is stored in the _max_workers property, which we can access and report after a process pool is created.
Note, "_max_workers" is a protected member and may change in the future.
The example below reports the number of default processes in a process pool on your system.
# SuperFastPython.com
# report the default number of worker processes on your system
from concurrent.futures import ProcessPoolExecutor
# entry point
def main():
# create a process pool with the default number of worker processes
pool = ProcessPoolExecutor()
# report the number of worker processes chosen by default
print(pool._max_workers)
if __name__ == '__main__':
main()
Running the example reports the number of worker processes used by default on your system.
I have four physical CPU cores, eight logical cores therefore the default is 8 processes.
8
How many worker processes are allocated by default on your system?
Let me know in the comments below.
Set The Number of Worker Processes
We can specify the number of worker processes directly and this is a good idea in most applications.
The example below demonstrates how to configure 60 worker processes.
# SuperFastPython.com
# configure and report the default number of worker processes
from concurrent.futures import ProcessPoolExecutor
# entry point
def main():
# create a process pool with a large number of worker processes
pool = ProcessPoolExecutor(60)
# report the number of worker processes
print(pool._max_workers)
if __name__ == '__main__':
main()
Running the example configures the process pool to use 60 processes and confirms that it will create 60 processes.
60
Common Questions
This section lists common questions related to the number of worker processes in the ProcessPoolExecutor.
Do you have a question about setting the number of processes?
Let me know in the comments and I will do my best to answer it and add it to this section.
What is the Default Number of Processes in the ProcessPoolExecutor?
The default number of processes in the ProcessPoolExecutor is equal to the number of logical CPU cores in your system.
For example:
- Total Number Worker Processes = CPUs in Your System
Where the number of CPUs in your system is determined by Python and will take hyperthreading into account.
For example if you have two CPU cores each with hyperthreading (which is common), then Python will detect four CPUs in your system.
How Many CPU Cores Do I Have?
You can check the number of CPU cores that are visible to Python via the os.cpu_count() function.
You can learn more about the os Python module here:
For example, the following program will report the number of CPU cores in your system that are available to your Python interpreter:
# report the number of CPUs in your system visible to Python
import os
print(os.cpu_count())
Does The Number of Processes in the ProcessPoolExecutor Match the Number of CPUs or Cores?
The number of worker processes in the ProcessPoolExecutor should probably match the number of CPU cores in your system if your tasks are CPU-bound.
This is a good default.
If your tasks are IO-bound you may set the number of processes to be equal to or a factor of the number of tasks you wish to complete. Although, your operating system may limit the number of processes you're able to create, e.g. 61 on Windows.
If you require hundreds or thousands of concurrent tasks executed and they are IO-bound, consider using the ThreadPoolExecutor instead.
How Many Processes Should I Use?
You should probably set the number of processes to be equal to the number of logical CPU cores in your system, e.g. the default.
- By default: Set to the number of logical CPU cores.
If you are expecting to perform computational work in the main process in addition to the process pool, consider setting the number of processes in the pool to be equal to the number of logical CPUs in your system minus one, to allow the main process to execute.
- If the main process is computationally intensive: Set to the number of logical CPU cores minus one.
If you have particularly CPU intensive tasks, consider configuring the number of processes to be equal to the number of physical CPUs instead of the number of logical CPUs.
- If tasks are computationally intensive: Set to the number of physical CPU cores.
What is the Maximum Number of Worker Processes in the ProcessPoolExecutor?
The maximum number of worker processes may be limited by your operating system.
For example, on windows, you will not be able to create more than 61 processes in your Python program.
Other operating systems like MacOS and Linux may impose an upper limit on the number of processes that may be spawned or forked.
Additionally, your system will have an upper limit of the number of processes you can create based on how much main memory (RAM) you have available.
Nevertheless, before you exceed main memory, you will reach a point of diminishing returns in terms of adding new processes and executing more tasks. This is because your operating system must switch between the processes, called context switching. With too many processes active at once, your program may spend more time context switching than actually executing tasks.
A sensible upper limit for most applications is to set the number of processes to be equal to the number of logical CPU cores or the number of physical CPU cores in your system.
Takeaways
You now know how to configure the number of processes for the ProcessPoolExecutor in Python.
If you enjoyed this tutorial, you will love my book: Python ProcessPoolExecutor Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.