Fix a Broken Mutex Lock When Terminating Child Processes

You can fix and recover a mutex lock used in a terminated child process by first releasing it before using it again, and handling the case where the lock had not been acquired.

In this tutorial, you will discover how to recover a mutex lock used in a terminated child process.

Let’s get started.

Table of Contents

Broken Mutex Lock When Terminating Child Processes

We can run a task in a child process.

This can be achieved by creating an instance of the multiprocessing.Process class and specify the target function to execute via the “target” argument and any arguments to the target function via the “args” argument.

For example:

...

# create and configure a child process

process = multiprocessing.Process(target=task)

You can learn more about running a function in a child process in the tutorial:

Run a Function in a Child Process

We may create a mutex lock via the multiprocessing.Lock class and share it with the child process.

For example:

...

# create a mutex lock

lock = multiprocessing.Lock()

You can learn more about multiprocessing locks in the tutorial:

Multiprocessing Lock in Python

It is possible to terminate a child process while it has acquired the lock.

This can be achieved by calling the terminate() method on the Process instance.

For example:

...

# terminate the child process

process.terminate()

You can learn more about terminating a child process in the tutorial:

Kill a Process in Python

Terminating the child process while it holds a mutex lock may leave the lock in an unknown and inconsistent state.

This case is specifically warned against in the multiprocessing module documentation.

For example:

Using the Process.terminate method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes.

Therefore it is probably best to only consider using Process.terminate on processes which never use any shared resources.
—multiprocessing — Process-based parallelism

How can we fix a mutex lock left in an inconsistent state when terminating a child process?

Run loops using all CPUs, download your FREE book to learn how.

How to Fix a Broken Mutex Lock

If a mutex lock is used in a child process that is terminated, the lock is not broken.

Instead, the lock is in an unknown state.

If the lock had been acquired, it would not have been released if the child process was terminated.

Therefore, we must explicitly release the lock before using it.

For example:

...

# recover the broken lock

lock.release()

# acquire the lock

with lock:

# ...

If the lock had not been acquired or had already been released and we acquire it again, a ValueError exception will be raised.

Unlike the threading.Lock class, the multiprocessing.Lock class does not offer a locked() method to check if a mutex lock is currently locked.

Therefore, if we don’t know the state of the mutex lock we can try and release it before acquiring it, and wrap the call in a try-except block.

For example:

...

# try and recover the broken lock

try:

lock.release()

except ValueError:

pass

# acquire the lock

with lock:

# ...

Now that we know how to recover a lock used in a terminated child process, let’s look at some worked examples.

Download Now: Free Multiprocessing PDF Cheat Sheet

Example of a Broken Mutex Lock When Terminating Child Process

Before we look at recovering a mutex lock from a child process, let’s look at a failure case.

In this example, we will share a lock with a child process, then terminate the child process while it is holding the lock. The parent process will then attempt to acquire the lock, which will fail with a deadlock as the lock will never be released by the terminated child process.

Firstly, we can define a task function that takes the lock as an argument, acquires it, and blocks for a few seconds to simulate computational effort.

# task executed in a child process

def task(lock):

# acquire the lock

with lock:

# block for a while to simulate work

print('Task has lock...', flush=True)

sleep(2)

Next, in the main process, we can create the shared lock, then create and configure the child process to execute our task() function and take the shared lock as an argument.

...

# create the shared lock

lock = Lock()

# create the child process

child = Process(target=task, args=(lock,))

# start the child process

child.start()

The main process will then wait a moment then terminate the child and wait for it to close.

...

# wait a moment

sleep(1)

# terminate the child process

child.terminate()

# wait for child to terminate

child.join()

Finally, the main process will attempt to acquire the shared lock.

...

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of terminating a process and breaking a lock

from time import sleep

from multiprocessing import Process

from multiprocessing import Lock

# task executed in a child process

def task(lock):

# acquire the lock

with lock:

# block for a while to simulate work

print('Task has lock...', flush=True)

sleep(2)

# protect the entry point

if __name__ == '__main__':

# create the shared lock

lock = Lock()

# create the child process

child = Process(target=task, args=(lock,))

# start the child process

child.start()

# wait a moment

sleep(1)

# terminate the child process

child.terminate()

# wait for child to terminate

child.join()

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Running the example first creates the shared lock.

The child process is then created and configured to execute our task() function and is passed the shared lock as an argument.

The child process is started and the main process then blocks for one second.

The child process runs, acquires the lock, reports a message then sleeps.

The main process resumes and terminates the child process and waits for it to close. It then reports that the child process is no longer alive.

At this point, the lock is acquired and the child process is stopped.

The main process reports a message and attempts to acquire the lock.

This is a deadlock. The main process is unable to acquire the lock because it was acquired by the child process and never released.

It is important to note that even though the child process used the context manager interface, it did not release the lock because the exit of the context manager was never executed when the child process was terminated.

Task has lock...

Child is terminated, still alive: False

Main acquiring lock...

Next, let’s look at how we might recover the lock.

Free Python Multiprocessing Course

Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.

Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.

Learn more

Example of Fixing a Broken Mutex Lock

We can explore the case of fixing a broken lock before acquiring it.

In this example, we will update the previous example to first release the lock before acquiring it.

We know that the lock will be acquired at the time the child process is terminated and therefore we know that all we need to do is release the lock before acquiring it again.

For example:

...

# try and recover the broken lock

lock.release()

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of attempting to recover a broken lock

from time import sleep

from multiprocessing import Process

from multiprocessing import Lock

# task executed in a child process

def task(lock):

# acquire the lock

with lock:

# block for a while to simulate work

print('Task has lock...', flush=True)

sleep(2)

# protect the entry point

if __name__ == '__main__':

# create the shared lock

lock = Lock()

# create the child process

child = Process(target=task, args=(lock,))

# start the child process

child.start()

# wait a moment

sleep(1)

# terminate the child process

child.terminate()

# wait for child to terminate

child.join()

# try and recover the broken lock

lock.release()

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Running the example first creates the shared lock.

The child process is then created and configured to execute our task() function and is passed the shared lock as an argument.

The child process is started and the main process then blocks for one second.

The child process runs, acquires the lock, reports a message then sleeps.

The main process resumes and terminates the child process and waits for it to close. It then reports that the child process is no longer alive.

At this point, the lock is acquired and the child process is stopped.

The main process then releases the lock. It reports a message and attempts to acquire the lock.

The lock is acquired normally, as expected.

Task has lock...

Child is terminated, still alive: False

Main acquiring lock...

Acquired lock

Main done

This is a contrived situation because we know the state of the lock in the terminated child process.

Next, let’s explore the case where we don’t know the state of the lock in the terminated child process.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Example of More Safely Fixing a Broken Mutex Lock

We can explore the case where we don’t know the status of the lock in the terminated child process.

If we assume the lock is acquired in the child process when it was terminated, then attempt to release it before acquiring it again in the main process, then this will result in a ValueError exception.

We can demonstrate this with a worked example.

The task() function can be updated to block before acquiring the lock.

The main process will then terminate the child process before the child process is able to acquire the lock. It will then release the lock and attempt to acquire it.

The complete example is listed below.

# SuperFastPython.com

# example of attempting to recover a broken lock and failing

from time import sleep

from multiprocessing import Process

from multiprocessing import Lock

# task executed in a child process

def task(lock):

# work before we get the lock

sleep(2)

# acquire the lock

with lock:

# block for a while to simulate work

print('Task has lock...', flush=True)

sleep(2)

# protect the entry point

if __name__ == '__main__':

# create the shared lock

lock = Lock()

# create the child process

child = Process(target=task, args=(lock,))

# start the child process

child.start()

# wait a moment

sleep(1)

# terminate the child process

child.terminate()

# wait for child to terminate

child.join()

# try and recover the broken lock

lock.release()

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Running the example first creates the shared lock.

The child process is then created and configured to execute our task() function and is passed the shared lock as an argument.

The child process is started and the main process then blocks for one second.

The child process runs and sleeps.

The main process resumes and terminates the child process and waits for it to close.

It then attempts to release the lock before acquiring it again.

This fails with a ValueError exception.

Traceback (most recent call last):

...

ValueError: semaphore or lock released too many times

This highlights that we cannot assume the state of the lock in the terminated child process and always release it before acquiring it again.

Instead, we need a more robust solution.

We can update the example to still attempt to always release the lock before using it, but protect against a possible ValueError exception, in case the lock was not locked.

...

# try and recover the broken lock

try:

lock.release()

except ValueError:

pass

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of attempting to recover a broken lock more safely

from time import sleep

from multiprocessing import Process

from multiprocessing import Lock

# task executed in a child process

def task(lock):

# work before we get the lock

sleep(2)

# acquire the lock

with lock:

# block for a while to simulate work

print('Task has lock...', flush=True)

sleep(2)

# protect the entry point

if __name__ == '__main__':

# create the shared lock

lock = Lock()

# create the child process

child = Process(target=task, args=(lock,))

# start the child process

child.start()

# wait a moment

sleep(1)

# terminate the child process

child.terminate()

# wait for child to terminate

child.join()

# try and recover the broken lock

try:

lock.release()

except ValueError:

pass

print(f'Child is terminated, still alive: {child.is_alive()}')

# try to acquire the lock

print('Main acquiring lock...')

with lock:

print('Acquired lock')

# all done normally

print('Main done')

Running the example first creates the shared lock.

The child process is then created and configured to execute our task() function and is passed the shared lock as an argument.

The child process is started and the main process then blocks for one second.

The child process runs and sleeps.

The main process resumes and terminates the child process and waits for it to close.

It then attempts to release the lock before acquiring it again.

This fails with a ValueError exception, which is caught and ignored. The main process then reports a message, acquires the lock, and reports a message.

This highlights how we can safely recover a lock from a terminated child process with an unknown state.

Child is terminated, still alive: False

Main acquiring lock...

Acquired lock

Main done