How to Configure the Multiprocessing Pool in Python

July 2, 2022 Python Multiprocessing Pool

You can configure the process pool via arguments to the multiprocessing.pool.Pool class constructor.

In this tutorial you will discover how to configure the process pool in Python.

Let's get started.

Need to Configure the Process Pool

The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.

A process pool can be configured when it is created, which will prepare the child workers.

A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.

-- multiprocessing — Process-based parallelism

We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map(). Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().

The process pool can be customized for the application.

What can be configured in the process pool and how can we configure it?

How to Configure the Process Pool

The process pool can be configured by specifying arguments to the multiprocessing.pool.Pool class constructor.

Process Pool Arguments

The arguments to the constructor are as follows:

Next, let's look at the default configuration for the process pool.

Default Configuration

By default the multiprocessing.pool.Pool class constructor does not take any arguments.

For example:

...
# create a default process pool
pool = multiprocessing.pool.Pool()

This will create a process pool that will use a number of worker processes that matches the number of logical CPU cores in your system.

It will not call a function that initializes the worker processes when they are created.

Each worker process will be able to execute an unlimited number of tasks within the pool.

Finally, the default multiprocessing context will be used, along with the currently configured or default start method for the system.

Now that we know what configuration the process pool takes, let's look at how we might configure each aspect of the process pool.

How to Configure the Number of Worker Processes

We can configure the number of worker processes in the multiprocessing.pool.Pool by setting the "processes" argument in the constructor.

processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.

-- multiprocessing — Process-based parallelism

We can set the "processes" argument to specify the number of child processes to create and use as workers in the process pool.

For example:

...
# create a process pool with 4 workers
pool = multiprocessing.pool.Pool(processes=4)

The "processes" argument is the first argument in the constructor and does not need to be specified by name to be set, for example:

...
# create a process pool with 4 workers
pool = multiprocessing.pool.Pool(4)

If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner.

For example:

...
# create a process pool with 4 workers
with multiprocessing.pool.Pool(4):
	# ...

You can learn more about how to configure the number of worker processes in the tutorial:

Next, let's look at how we might configure the worker process initialization function.

How to Configure the Initialization Function

We can configure worker processes in the process pool to execute an initialization function prior to executing tasks.

This can be achieved by setting the "initializer" argument when configuring the process pool via the class constructor.

The "initializer" argument can be set to the name of a function that will be called to initialize the worker processes.

If initializer is not None then each worker process will call initializer(*initargs) when it starts.

-- multiprocessing — Process-based parallelism

For example:

# worker process initialization function
def worker_init():
	# ...

...
# create a process pool and initialize workers
pool = multiprocessing.pool.Pool(initializer=worker_init)

If our worker process initialization function takes arguments, they can be specified to the process pool constructor via the "initargs" argument, which takes an ordered list or tuple of arguments for the custom initialization function.

For example:

# worker process initialization function
def worker_init(arg1, arg2, arg3):
	# ...

...
# create a process pool and initialize workers
pool = multiprocessing.pool.Pool(initializer=worker_init, initargs=(arg1, arg2, arg3))

You can learn more about how to initialize worker processes in the tutorial:

Next, let's look at how we might configure the maximum tasks per child worker process.

How to Configure the Max Tasks Per Child

We can limit the maximum number of tasks completed by each child process in the process pool by setting the "maxtasksperchild" argument in the multiprocessing.pool.Pool class constructor when configuring a new process pool.

For example:

...
# create a process loop and limit the number of tasks in each worker
pool = multiprocessing.pool.Pool(maxtasksperchild=5)

The maxtasksperchild takes a positive integer number of tasks that may be completed by a child worker process, after which the process will be terminated and a new child worker process will be created to replace it.

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed.

-- multiprocessing — Process-based parallelism

By default the maxtasksperchild argument is set to None, which means each child worker process will run for the lifetime of the process pool.

The default maxtasksperchild is None, which means worker processes will live as long as the pool.

-- multiprocessing — Process-based parallelism

You can learn more about configuring the max tasks per worker process in the tutorial:

Next, let's look at how we might configure the multiprocess context for the pool.

How to Configure the Context

We can set the context for the process pool via the "context" argument to the multiprocessing.pool.Pool class constructor.

context can be used to specify the context used for starting the worker processes.

-- multiprocessing — Process-based parallelism

The "context" is an instance of a multiprocessing context configured with a start method, created via the multiprocessing.get_context() function.

By default, "context" is None, which uses the current default context and start method configured for the application.

A start method is the technique used to start child processes in Python.

There are three start methods, they are:

Multiprocessing contexts provide a more flexible way to manage process start methods directly within a program, and may be a preferred approach to changing start methods in general, especially within a Python library.

A new context can be created with a given start method and passed to the process pool.

For example:

...
# create a process context
ctx = multiprocessing.get_context('fork')
# create a process pool with a given context
pool = multiprocessing.pool.Pool(context=ctx)

You can learn more about configuring the context for the process pool in the tutorial:

Takeaways

You now know how to configure the process pool in Python.



If you enjoyed this tutorial, you will love my book: Python Multiprocessing Pool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.