Last Updated on November 21, 2022
You can configure the number of workers in the multiprocessing.pool.Pool via the “processes” argument.
In this tutorial you will discover how to configure the number of worker processes in the process pool in Python.
Let’s get started.
Need to Configure the Number of Worker Processes
The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.
A process pool can be configured when it is created, which will prepare the child workers.
We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map(). Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().
The process pool has a fixed number of worker processes.
It is important to limit the number of worker processes in the process pools to perhaps the number of logical CPU cores or the number of physical CPU cores in your system, depending on the types of tasks we will be executing.
How do you configure the number of worker processes in the multiprocessing pool?
Run loops using all CPUs, download your FREE book to learn how.
How to Configure The Number of Workers
We can configure the number of worker processes in the multiprocessing.pool.Pool by setting the “processes” argument in the constructor.
By default this equals the number of logical CPUs in your system.
processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.
— multiprocessing — Process-based parallelism
For example, if we had 4 physical CPU cores with hyperthreading, this would mean we would have 8 logical CPU cores and this would be the default number of workers in the process pool.
We can set the “processes” argument to specify the number of child processes to create and use as workers in the process pool.
For example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.pool.Pool(processes=4) |
The “processes” argument is the first argument in the constructor and does not need to be specified by name to be set, for example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.pool.Pool(4) |
If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner.
For example:
1 2 3 4 |
... # create a process pool with 4 workers with multiprocessing.pool.Pool(4): # ... |
The number of workers must be less than or equal to 61 if Windows is your operating system.
It is common to have more processes than CPUs (physical or logical) in your system, if the target task function is performing blocking IO operations.
The reason for this is because processes are used for IO-bound tasks, not CPU bound tasks. This means that processes are used for tasks that wait for relatively slow resources to respond, like hard drives, printers, and network connections, and much more.
If you require hundreds or processes for IO-bound tasks, you might want to consider using threads instead and the ThreadPoolExecutor. If you require thousands of processes for IO-bound tasks, you might want to consider using the AsyncIO module.
Now that we know how to configure the number of worker processes in the multiprocessing.pool.Pool, let’s look at some worked examples.
Check the Default Number of Workers
First, let’s check how many processes are created by default for process pools on your system.
One approach is to report the status of the process pool directly by printing the object.
This will create a string representation of the process pool that includes its current status and the number of child worker processes that are running.
For example:
1 2 3 |
... # report the status of the process pool print(pool) |
Looking at the source code for the multiprocessing.pool.Pool class, we can see that the number of worker processes chosen by default is stored in the _processes attribute, which we can access and report after a process pool is created.
For example:
1 2 3 |
... # report the number of processes in the pool print(pool._processes) |
Note, _processes is a protected member and may change in the future.
A final approach we can use is to get a list of all active child processes and report the length of this list.
This can be achieved using the multiprocessing.active_children() function and is only effective if the process pool is the only source of child processes for the current process.
For example:
1 2 3 4 |
... # report the number of active child processes children = active_children() print(len(children)) |
Tying this together, the example below creates a process pool with the default number of child worker processes and reports the number of worker processes that were created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# SuperFastPython.com # example of setting the default number of workers in the process pool from multiprocessing.pool import Pool from multiprocessing import active_children # protect the entry point if __name__ == '__main__': # create a process pool with the default number of workers pool = Pool() # report the status of the process pool print(pool) # report the number of processes in the pool print(pool._processes) # report the number of active child processes children = active_children() print(len(children)) |
Running the example first creates the process pool, configured with the default number of worker processes.
The status of the process pool is then reported, showing that it is running and configured with a pool size.
The process pool attribute for the number of workers is then reported, then the number of active child processes is reported.
All approaches agree and in this case we can see that the pool was configured with 8 workers on my system.
Note, results will differ depending on the number of CPU cores in your system.
1 2 3 |
<multiprocessing.pool.Pool state=RUN pool_size=8> 8 8 |
How many worker processes are allocated by default on your system?
Let me know in the comments below.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
Example of Configuring The Number of Workers
We can specify the number of worker processes directly and this is a good idea in most applications.
The example below demonstrates how to configure 60 worker processes using the context manager interface for the multiprocessing.pool.Pool class.
1 2 3 4 5 6 7 8 9 10 |
# SuperFastPython.com # example of configuring the number of worker processes from multiprocessing.pool import Pool # protect the entry point if __name__ == '__main__': # create a process pool with many workers with Pool(60) as pool: # report the status of the pool print(pool) |
Running the example configures the process pool to use 60 processes and confirms that it will create 60 processes.
1 |
<multiprocessing.pool.Pool state=RUN pool_size=60> |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Common Questions
This section lists common questions related to the number of worker processes in the multiprocessing Pool.
Do you have a question about setting the number of processes?
Let me know in the comments and I will do my best to answer it and add it to this section.
What is a CPU and What is a CPU Core?
A central processing unit or simply “processor” or “CPU” is a chip in the computer that executes instructions.
Traditionally, we had one CPU in the computer, perhaps with a math coprocessor.
- CPU: Central processing unit, a chip within the computer for executing instructions.
A core is another name for a physical CPU for executing instructions.
A computer with multiple CPUs is referred to as having multiple cores.
Similarly, a computer chip that has multiple CPUs within it, is referred to as a multi-core processor.
- Multi-core processor: A physical chip with multiple CPUs or cores.
As such, as developers, the terms “CPUs” and “cores” are used interchangeably. We might even refer to them as “CPU cores”.
Almost all modern computers have multiple cores.
What are Physical CPUs vs Logical CPUs?
Modern CPUs typically make use of a technology called hyperthreading.
Hyperthreading does not refer to a program using threads. Instead, it refers to a technology within the CPU cores themselves that allows each physical core or CPU to act as if it were two logical cores or two CPUs.
- Physical Cores: The number of CPU cores provided in the hardware, e.g. the chips.
- Logical Cores: The number of CPU cores after hyperthreading is taken into account.
It provides automatic in-core parallelism that can offer up to a 30% speed-up over CPU cores that do not offer the technology.
As such, when we count CPU cores in a system, we typically count the number of logical CPU cores, not the number of physical CPU cores.
If you know your system uses hyperthreading (it probably does), then you can get the number of physical CPUs in your system by dividing the number of logical CPUs by two.
- Count Physical Cores = Count Logical Cores / 2
What is the Default Number of Processes in the Pool?
The default number of processes in the multiprocessing.pool.Pool is equal to the number of logical CPU cores in your system.
For example:
- Total Number Worker Processes = CPUs in Your System
Where the number of CPUs in your system is determined by Python and will take hyperthreading into account.
For example if you have two CPU cores each with hyperthreading (which is common), then Python will detect four CPUs in your system.
How Many CPU Cores Do I Have?
There are a number of ways to determine the number of CPU cores in your system.
Some functions include:
- multiprocessing.cpu_count() function
- os.cpu_count() function.
For example:
1 2 3 |
... # get the number of logical cpu cores n_cores = multiprocessing.cpu_count() |
You can learn more in the tutorial:
Should The Number of Processes in the Pool Match the Number of CPUs or Cores?
The number of worker processes in the multiprocessing.pool.Pool should probably match the number of CPU cores in your system if your tasks are CPU-bound.
This is a good default.
If your tasks are IO-bound you may set the number of processes to be equal to or a factor of the number of tasks you wish to complete. Although, your operating system may limit the number of processes you’re able to create, e.g. 61 on Windows.
If you require hundreds or thousands of concurrent tasks executed and they are IO-bound, consider using the ThreadPoolExecutor instead.
How Many Processes Should I Use?
You should probably set the number of processes to be equal to the number of logical CPU cores in your system, e.g. the default.
- By default: Set to the number of logical CPU cores.
If you are expecting to perform computational work in the main process in addition to the process pool, consider setting the number of processes in the pool to be equal to the number of logical CPUs in your system minus one, to allow the main process to execute.
- If the main process is computationally intensive: Set to the number of logical CPU cores minus one.
If you have particularly CPU intensive tasks, consider configuring the number of processes to be equal to the number of physical CPUs instead of the number of logical CPUs.
- If tasks are computationally intensive: Set to the number of physical CPU cores.
What is the Maximum Number of Worker Processes in the Pool?
The maximum number of worker processes may be limited by your operating system.
For example, on windows, you will not be able to create more than 61 child processes in your Python program.
Other operating systems like macOS and Linux may impose an upper limit on the number of processes that may be spawned or forked.
Additionally, your system will have an upper limit of the number of processes you can create based on how much main memory (RAM) you have available.
Nevertheless, before you exceed main memory, you will reach a point of diminishing returns in terms of adding new processes and executing more tasks. This is because your operating system must switch between the processes, called context switching. With too many processes active at once, your program may spend more time context-switching than actually executing tasks.
A sensible upper limit for most applications is to set the number of processes to be equal to the number of logical CPU cores or the number of physical CPU cores in your system.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know how to configure the number of worker processes in the process pool.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Brett Jordan on Unsplash
Do you have any questions?