Multiprocessing Pool Restarts Workers if Killed

August 23, 2022 Python Multiprocessing Pool

Child worker processes in the multiprocessing pool will be restarted automatically if killed.

In this tutorial you will discover what happens if a child worker process is killed in the multiprocessing pool in Python.

Let's get started.

What If One Worker is Killed

The multiprocessing pool provides a pool of reusable workers for executing ad hoc tasks with process-based concurrency.

An instance of the multiprocessing.Pool class can be created, specifying the number of workers to create, otherwise a default number of workers will be created to match the number of logical CPUs in the system.

Once created, ad hoc tasks can be issued to the pool for execution via the Pool.apply() function. Multiple tasks may be executed in the book by calling the same function with different arguments via the Pool.map() function.

Tasks may be executed asynchronously in the pool via Pool.apply_async() and Pool.map_async(), allowing the caller to carry on with other tasks.

Once concern with the multiprocessing pool is what happens if one child worker process in the pool is killed.

A child worker process may be killed any number of ways, for example:

There are two main outcomes that may occur if one child worker process is killed.

What happens if a child worker process in the pool is killed?

Child Workers Are Restarted if Killed

Child worker processes in the multiprocessing pool are restarted if they are killed.

This is a capability and feature of the multiprocessing.Pool class, although it is not documented.

This capability is likely related to the class's ability to limit the number of tasks executed by each child worker before the worker is replaced via the "maxtasksperchild" argument to the class constructor.

For example, if a parent Python process gets a handle on a child worker processes, such as via its multiprocessing.Process instance, and kills it, such as via the kill() function, then the multiprocessing.Pool will identify that this has occurred and will restart the child worker process.

Now that we know what happens if a child worker process is killed, let's look at a worked example.

Example of Killing a Child Worker Process

We can explore what happens if we kill one child worker process in the multiprocessing pool.

In this example, we will create a multiprocessing pool with a fixed number of workers. We will then list all child processes for the process to confirm that the workers were created and running. A single child process is then killed directly. The child processes for the current process are then listed again to confirm that a new worker was started in its place.

Firstly, we can create a multiprocessing pool with four worker processes.

Don't worry if you have more or fewer CPU cores, as we will not be running any tasks.

...
# create a pool
pool = Pool(4)

Next, we can wait a moment for the pool to initialize completely. This is probably not required, but will likely avoid any possible race conditions in checking if the workers are running.

...
# wait a moment
sleep(0.5)

We can then get a list of all child processes for the current process. This will include all workers in the process pool.

This can be achieved via the multiprocessing.active_children() function

...
# report all active child processes
children = active_children()

We can then report the details of each process in this list.

...
for child in children:
    print(child)

We can then get one child worker process instance from the list and explicitly kill it.

This will send a SIGKILL signal and terminate the child process immediately.

...
# kill one child
child = children[0]
child.kill()

We will then wait a moment for the child process to be terminated.

...
# wait a moment
sleep(0.5)

The details of the terminated process can be reported, confirming it was killed.

...
print(child)

We can then get a list of all child processes for the current process and report their details.

This may include any new child workers created to replace the killed process.

...
# report all active child processes
for child in active_children():
    print(child)

Finally, we can close the multiprocessing pool now that we are finished with it.

...
# close the pool
pool.close()

Tying this together, the complete example is listed below.

# SuperFastPython.com
# what happens if we kill a child worker process
from time import sleep
from multiprocessing import Pool
from multiprocessing import active_children

# protect the entry point
if __name__ == '__main__':
    # create a pool
    pool = Pool(4)
    # wait a moment
    sleep(0.5)
    # report all active child processes
    children = active_children()
    for child in children:
        print(child)
    # kill one child
    child = children[0]
    child.kill()
    # wait a moment
    sleep(0.5)
    print(child)
    # report all active child processes
    for child in active_children():
        print(child)
    # close the pool
    pool.close()

Running the example first creates a multiprocessing pool with four child worker processes.

The main process then blocks a moment to wait for the pool to initialize completely.

Next, a list of all child processes is retrieved and their details are reported.

In this case, we can see the details of the four child worker processes in the pool.

One of the workers is then killed and the main process blocks for a moment.

It then reports the details of the killed process and indeed we can see that its status is "stopped" and that its exit code indicates it received a SIGKILL signal.

Finally, all child processes are retrieved again and their details are reported. We can see four processes, and a new process (id 10095) was created to replace the process that was killed (id 10092).

This confirms that the multiprocessing pool will create new child worker processes to replace those workers that are killed.

Note, process ids will differ each time the program is run as the ids are allocated by the underlying operating system.

<SpawnProcess name='SpawnPoolWorker-2' pid=10092 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-1' pid=10091 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-3' pid=10093 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-4' pid=10094 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-2' pid=10092 parent=10089 stopped exitcode=-SIGKILL daemon>
<SpawnProcess name='SpawnPoolWorker-3' pid=10093 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-4' pid=10094 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-1' pid=10091 parent=10089 started daemon>
<SpawnProcess name='SpawnPoolWorker-5' pid=10095 parent=10089 started daemon>

Takeaways

You now know what happens if a child worker process is killed in the multiprocessing pool.



If you enjoyed this tutorial, you will love my book: Python Multiprocessing Pool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.