Last Updated on September 12, 2022
You can configure the process pool via arguments to the multiprocessing.pool.Pool class constructor.
In this tutorial you will discover how to configure the process pool in Python.
Let’s get started.
Need to Configure the Process Pool
The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.
A process pool can be configured when it is created, which will prepare the child workers.
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
— multiprocessing — Process-based parallelism
We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map(). Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().
The process pool can be customized for the application.
What can be configured in the process pool and how can we configure it?
Run loops using all CPUs, download your FREE book to learn how.
How to Configure the Process Pool
The process pool can be configured by specifying arguments to the multiprocessing.pool.Pool class constructor.
Process Pool Arguments
The arguments to the constructor are as follows:
- processes: Maximum number of worker processes to use in the pool.
- initializer: Function executed after each worker process is created.
- initargs: Arguments to the worker process initialization function.
- maxtasksperchild: Limit the maximum number of tasks executed by each worker process.
- context: Configure the multiprocessing context such as the process start method.
Next, let’s look at the default configuration for the process pool.
Default Configuration
By default the multiprocessing.pool.Pool class constructor does not take any arguments.
For example:
1 2 3 |
... # create a default process pool pool = multiprocessing.pool.Pool() |
This will create a process pool that will use a number of worker processes that matches the number of logical CPU cores in your system.
It will not call a function that initializes the worker processes when they are created.
Each worker process will be able to execute an unlimited number of tasks within the pool.
Finally, the default multiprocessing context will be used, along with the currently configured or default start method for the system.
Now that we know what configuration the process pool takes, let’s look at how we might configure each aspect of the process pool.
How to Configure the Number of Worker Processes
We can configure the number of worker processes in the multiprocessing.pool.Pool by setting the “processes” argument in the constructor.
processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.
— multiprocessing — Process-based parallelism
We can set the “processes” argument to specify the number of child processes to create and use as workers in the process pool.
For example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.pool.Pool(processes=4) |
The “processes” argument is the first argument in the constructor and does not need to be specified by name to be set, for example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.pool.Pool(4) |
If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner.
For example:
1 2 3 4 |
... # create a process pool with 4 workers with multiprocessing.pool.Pool(4): # ... |
You can learn more about how to configure the number of worker processes in the tutorial:
Next, let’s look at how we might configure the worker process initialization function.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
How to Configure the Initialization Function
We can configure worker processes in the process pool to execute an initialization function prior to executing tasks.
This can be achieved by setting the “initializer” argument when configuring the process pool via the class constructor.
The “initializer” argument can be set to the name of a function that will be called to initialize the worker processes.
If initializer is not None then each worker process will call initializer(*initargs) when it starts.
— multiprocessing — Process-based parallelism
For example:
1 2 3 4 5 6 7 |
# worker process initialization function def worker_init(): # ... ... # create a process pool and initialize workers pool = multiprocessing.pool.Pool(initializer=worker_init) |
If our worker process initialization function takes arguments, they can be specified to the process pool constructor via the “initargs” argument, which takes an ordered list or tuple of arguments for the custom initialization function.
For example:
1 2 3 4 5 6 7 |
# worker process initialization function def worker_init(arg1, arg2, arg3): # ... ... # create a process pool and initialize workers pool = multiprocessing.pool.Pool(initializer=worker_init, initargs=(arg1, arg2, arg3)) |
You can learn more about how to initialize worker processes in the tutorial:
Next, let’s look at how we might configure the maximum tasks per child worker process.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
How to Configure the Max Tasks Per Child
We can limit the maximum number of tasks completed by each child process in the process pool by setting the “maxtasksperchild” argument in the multiprocessing.pool.Pool class constructor when configuring a new process pool.
For example:
1 2 3 |
... # create a process loop and limit the number of tasks in each worker pool = multiprocessing.pool.Pool(maxtasksperchild=5) |
The maxtasksperchild takes a positive integer number of tasks that may be completed by a child worker process, after which the process will be terminated and a new child worker process will be created to replace it.
maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed.
— multiprocessing — Process-based parallelism
By default the maxtasksperchild argument is set to None, which means each child worker process will run for the lifetime of the process pool.
The default maxtasksperchild is None, which means worker processes will live as long as the pool.
— multiprocessing — Process-based parallelism
You can learn more about configuring the max tasks per worker process in the tutorial:
Next, let’s look at how we might configure the multiprocess context for the pool.
How to Configure the Context
We can set the context for the process pool via the “context” argument to the multiprocessing.pool.Pool class constructor.
context can be used to specify the context used for starting the worker processes.
— multiprocessing — Process-based parallelism
The “context” is an instance of a multiprocessing context configured with a start method, created via the multiprocessing.get_context() function.
By default, “context” is None, which uses the current default context and start method configured for the application.
A start method is the technique used to start child processes in Python.
There are three start methods, they are:
- spawn: start a new Python process.
- fork: copy a Python process from an existing process.
- forkserver: new process from which future forked processes will be copied.
Multiprocessing contexts provide a more flexible way to manage process start methods directly within a program, and may be a preferred approach to changing start methods in general, especially within a Python library.
A new context can be created with a given start method and passed to the process pool.
For example:
1 2 3 4 5 |
... # create a process context ctx = multiprocessing.get_context('fork') # create a process pool with a given context pool = multiprocessing.pool.Pool(context=ctx) |
You can learn more about configuring the context for the process pool in the tutorial:
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know how to configure the process pool in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by uniqsurface on Unsplash
Do you have any questions?