You can identify multiprocessing deadlocks by seeing examples and developing an intuition for their common causes.
In most cases, deadlocks can be avoided by using best practices in concurrency programming, such as lock order, using time outs on waits, and using context managers when acquiring locks.
In this tutorial, you will discover how to identify deadlocks with process-based concurrency in Python.
Let’s get started.
What is a Deadlock
A deadlock is a concurrency failure mode where a process or processes wait for a condition that never occurs.
The result is that the deadlock processes are unable to progress and the program is stuck or frozen and must be terminated forcefully.
There are many ways in which you may encounter a deadlock in your concurrent program.
Deadlocks are not developed intentionally, instead, they are an unexpected side effect or bug in concurrency programming.
Common examples of the cause of deadlocks include:
- Processes that wait on themselves (e.g. attempts to acquire the same mutex lock twice).
- Processes that wait on each other (e.g. A waits on B, B waits on A).
- Processes that fail to release a resource (e.g. mutex lock, semaphore, barrier, condition, event, etc.).
- Processes that acquire mutex locks in different orders (e.g. fail to perform lock ordering).
Deadlocks may be easy to describe, but hard to detect in an application just from reading code.
It is important to develop an intuition for the causes of different deadlocks. This will help you identify deadlocks in your own code and trace down the causes of those deadlocks that you may encounter.
Note, deadlocks may also happen with threads, you can learn more about thread deadlocks in the tutorial:
Deadlocks may also occur with coroutines in asyncio programs. You can learn more about coroutine deadlocks in the tutorial:
Now that we are familiar with what a deadlock is, let’s look at some worked examples.
Run loops using all CPUs, download your FREE book to learn how.
Deadlock 1: Process Waits on Itself
A common cause of a deadlock is a process that waits on itself.
We do not intend for this deadlock to occur, e.g. we don’t intentionally write code that causes a process to wait on itself. Instead, this occurs accidentally due to a series of function calls and variables being passed around.
A process may wait on itself for many reasons, such as:
- Waiting to acquire a mutex lock that it has already acquired.
- Waiting to be notified on a condition by itself.
- Waiting for an event to be set by itself.
- Waiting for a semaphore to be released by itself.
And so on.
We can demonstrate a deadlock caused by a process waiting on itself.
In this case, we will develop a task() function that directly attempts to acquire the same mutex lock twice. That is, the task will acquire the lock, then attempt to acquire the lock again.
This will cause a deadlock as the process already holds the lock and will wait forever for itself to release the lock so that it can acquire it again.
The task() function that attempts to acquire the same lock twice and trigger a deadlock is listed below.
1 2 3 4 5 6 7 8 |
# task to be executed in a new process def task(lock): print('Process acquiring lock...', flush=True) with lock: print('Process acquiring lock again...', flush=True) with lock: # will never get here pass |
If you are new to using mutex locks with processes, see the tutorial:
In the main process, we can then create the lock.
1 2 3 |
... # create the mutex lock lock = Lock() |
We will then create and configure a new process to execute our task() function in a child process, then start the process and wait for it to terminate, which it never will.
1 2 3 4 5 6 7 |
... # create and configure the new process process = Process(target=task, args=(lock,)) # start the new process process.start() # wait for process to exit... process.join() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# SuperFastPython.com # example of a process that waits on itself resulting in a deadlock from multiprocessing import Process from multiprocessing import Lock # task to be executed in a new process def task(lock): print('Process acquiring lock...', flush=True) with lock: print('Process acquiring lock again...', flush=True) with lock: # will never get here pass # protect the entry point if __name__ == '__main__': # create the mutex lock lock = Lock() # create and configure the new process process = Process(target=task, args=(lock,)) # start the new process process.start() # wait for process to exit... process.join() |
Running the example first creates the lock.
The child process is then configured and started and the main process blocks until the child process terminates, which it never does.
The child process runs and first acquires the lock. It then attempts to acquire the same mutex lock again and blocks.
It will block forever waiting for the lock to be released. The lock cannot be released because the child process already holds the lock. Therefore the process has deadlocked.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 |
Process acquiring lock... Process acquiring lock again... |
Perhaps the above example is too contrived.
Attempting the same lock is common if you protect a critical section with a lock and within that critical section, you call another function that attempts to acquire the same lock.
For example, we can update the previous example to split the task() function into two functions task1() and task2(). The task1() function acquires the lock, does some work, then calls task2() that does some work, and attempts to acquire the lock again.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# SuperFastPython.com # example of a process that waits on itself resulting in a deadlock from multiprocessing import Process from multiprocessing import Lock # task2 to be executed in a new process def task2(lock): print('Process acquiring lock again...', flush=True) with lock: # will never get here pass # task1 to be executed in a new process def task1(lock): print('Process acquiring lock...', flush=True) with lock: task2(lock) # protect the entry point if __name__ == '__main__': # create the mutex lock lock = Lock() # create and configure the new process process = Process(target=task1, args=(lock,)) # start the new process process.start() # wait for process to exit... process.join() |
Running the example creates the lock and then creates and starts the child process.
The process acquires the lock in task1(), simulates some work then calls task2().
The task2() function attempts to acquire the same lock and the process is stuck in a deadlock waiting for the lock to be released by itself so it can acquire it again.
1 2 |
Process acquiring lock... Process acquiring lock again... |
This specific deadlock with a mutex lock can be avoided by using a reentrant mutex lock. This allows a process to acquire the same lock more than once.
A reentrant lock is recommended any time you may have code that acquires a lock that may call other code that may acquire the same lock.
Next, let’s look at a deadlock caused by processes waiting on each other.
Deadlock 2: Processes Waiting on Each Other
Another common deadlock is to have two (or more) processes waiting on each other.
For example, Process A is waiting on Process B, and Process B is waiting on Process A.
- Process A: Waiting on Process B.
- Process B: Waiting on Process A.
Or with three processes, you could have a cycle of processes waiting on each other, for example:
- Process A: Waiting on Process B.
- Process B: Waiting on Process C.
- Process C: Waiting on Process A.
This deadlock is common if you set up processes to wait on the result from other processes, such as in a pipeline or workflow where some dependencies for subtasks are out of order.
A simple way to demonstrate this deadlock is for the main process to create a child process and then wait for it to complete. The child process can get access to the parent process and wait for it to complete.
For example:
- New Process: Waiting on Main Process.
- Main Process: Waiting on New Process.
If you are new to parent vs child processes, see the tutorial:
We can define a task function that gets the parent process as well as the current process, reports the name of each and then waits on the parent process to complete.
This is somewhat contrived in this case.
1 2 3 4 5 6 7 8 9 10 |
# task to be executed in a new process def task(): # get the current process current = current_process() # get the parent process parent = parent_process() # report a message print(f'[{current.name}] waiting on [{parent.name}]...', flush=True) # wait on the parent process parent.join() |
We can then perform a similar action in the main process.
The main process can create the child process, report the name of the child as well as the name of the main process, then wait for the child process to complete.
1 2 3 4 5 6 7 8 9 10 |
... # get the current process current = current_process() # create the second process child = Process(target=task) # start the new process child.start() # wait on the child process print(f'[{current.name}] waiting on [{child.name}]...', flush=True) child.join() |
Tying this together, the complete example of two processes waiting on each other is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# SuperFastPython.com # example of a deadlock caused by process waiting on each other from multiprocessing import parent_process from multiprocessing import current_process from multiprocessing import Process # task to be executed in a new process def task(): # get the current process current = current_process() # get the parent process parent = parent_process() # report a message print(f'[{current.name}] waiting on [{parent.name}]...', flush=True) # wait on the parent process parent.join() # protect the entry point if __name__ == '__main__': # get the current process current = current_process() # create the second process child = Process(target=task) # start the new process child.start() # wait on the child process print(f'[{current.name}] waiting on [{child.name}]...', flush=True) child.join() |
Running the example first gets the multiprocessing.Process instance for the main process, then creates a child process and calls the task(), and starts it. The main process then reports the name of each process and then waits for the child process to complete
The child process gets the current and parent processes, reports a message, and waits for the parent process to terminate.
We have a deadlock.
Each process is waiting on the other to terminate before itself can terminate, resulting in a deadlock.
1 2 |
[MainProcess] waiting on [Process-1]... [Process-1] waiting on [MainProcess]... |
This type of deadlock can happen if processes are waiting on another process to terminate in a way that results in a cycle.
It may also happen for any type of wait operation, where process dependencies create a cycle, such as waiting on a mutex lock, waiting on a semaphore, waiting to be notified on a condition, and so on.
The waiting may also be less obvious given a level of indirection. For example, a process may be waiting on a queue that itself is populated by another process. The other process may in turn be waiting on the first process directly.
Next, let’s look at an example of a deadlock caused by processes acquiring locks in different orders.
Free Python Multiprocessing Course
Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.
Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.
Deadlock 3: Acquiring Locks in the Wrong Order
A common cause of a deadlock is when two processes acquire locks in different orders at the same time.
For example, we may have a critical section protected by a lock and within that critical section, we may have code or a function call that is protected by a second lock.
We may have a situation where one process acquires lock1, then attempts to acquire lock2, then has a second process that calls functionality that acquires lock2, then attempts to acquire lock1. If this occurs concurrently where process1 holds lock1 and process2 holds lock2, then there will be a deadlock.
- Process1: Holds Lock1, Waiting for Lock2.
- Process2: Holds Lock2, Waiting for Lock1.
We can demonstrate this with a direct example.
We can create a task() function that takes both locks as arguments and then attempts to acquire the first lock and then the second lock.
Two processes can then be created to call this function with the locks as arguments, but perhaps we make a typo and have the first process take lock1 then lock2 as arguments, and the second process takes lock2 then lock1 as arguments.
The result will be a deadlock if each process can first acquire a lock and then wait on the second lock.
First, let’s define the task() function that takes the two locks as arguments and acquires them one after the other.
1 2 3 4 5 6 7 8 9 10 11 12 |
# task to be executed in a new process def task(number, lock1, lock2): # acquire the first lock print(f'Process {number} acquiring lock 1...', flush=True) with lock1: # wait a moment sleep(1) # acquire the next lock print(f'Process {number} acquiring lock 2...', flush=True) with lock2: # never gets here.. pass |
Notice that we have added a sleep for one second after acquiring the first lock.
This ensures that the race condition occurs, giving enough time for each process to acquire its first lock before attempting to acquire the second lock.
Back in the main process, we can then create two separate mutex locks.
1 2 3 4 |
... # create the mutex locks lock1 = Lock() lock2 = Lock() |
Next, we can create and configure two child processes to call the task() function and transpose the lock arguments for one of them.
1 2 3 4 |
... # create and configure the new processes process1 = Process(target=task, args=(1, lock1, lock2)) process2 = Process(target=task, args=(2, lock2, lock1)) |
Finally, we can start the processes and wait in the main process for both processes to terminate.
1 2 3 4 5 6 7 |
... # start the new processes process1.start() process2.start() # wait for processes to exit... process1.join() process2.join() |
Tying this together, the complete example of a deadlock caused by acquiring locks in the wrong order is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# SuperFastPython.com # example of a deadlock caused by acquiring locks in a different order from time import sleep from multiprocessing import Process from multiprocessing import Lock # task to be executed in a new process def task(number, lock1, lock2): # acquire the first lock print(f'Process {number} acquiring lock 1...', flush=True) with lock1: # wait a moment sleep(1) # acquire the next lock print(f'Process {number} acquiring lock 2...', flush=True) with lock2: # never gets here.. pass # protect the entry point if __name__ == '__main__': # create the mutex locks lock1 = Lock() lock2 = Lock() # create and configure the new processes process1 = Process(target=task, args=(1, lock1, lock2)) process2 = Process(target=task, args=(2, lock2, lock1)) # start the new processes process1.start() process2.start() # wait for processes to exit... process1.join() process2.join() |
Running the example first creates both locks. Then both processes are created and the main process waits for the child processes to terminate.
The first process receives lock1 and lock2 as arguments. It acquires lock1 and sleeps.
The second process receives lock2 and lock1 as arguments. It acquires lock2 and sleeps.
The first process wakes and tries to acquire lock2, but it must wait as it is already acquired by the second process. The second process wakes and tries to acquire lock1, but it must wait as it is already acquired by the first process.
The result is a deadlock.
1 2 3 4 |
Process 2 acquiring lock 1... Process 1 acquiring lock 1... Process 2 acquiring lock 2... Process 1 acquiring lock 2... |
The solution is to ensure locks are always acquired in the same order throughout the program.
This is called lock ordering.
This could be achieved by data structures that ensure that locks are always acquired in the same order, e.g. a dict of locks, each assigned a unique number.
Next, let’s consider a deadlock caused by a process failing to release a lock.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Deadlock 4: Failing to Release a Lock
Another common cause of a deadlock is by a process failing to release a resource.
This is typically caused by a process raising an error or exception in a critical section, a way that prevents the process from releasing a resource.
Some examples include:
- Failing to release a lock.
- Failing to release a semaphore.
- Failing to arrive at a barrier.
- Failing to notify processes on a condition.
- Failing to set an event.
And so on.
We can demonstrate this deadlock with an example.
A process may acquire a lock in order to execute a critical section, then raise an exception that prevents the process from releasing the lock. Another process may then try to acquire the same lock and must wait forever as the lock will never be released, resulting in a deadlock.
First, we can define a task() function that takes the lock as an argument, acquires it manually via the acquire() function, then raises an exception preventing the lock from being released.
1 2 3 4 5 6 7 8 9 10 |
# task to be executed in a new process def task(lock): # acquire the lock print('Process acquiring lock...', flush=True) lock.acquire() # fail raise Exception('Something bad happened') # release the lock (never gets here) print('Process releasing lock...', flush=True) lock.release() |
Next, in the main process, we can create the lock that will be shared between processes.
1 2 3 |
... # create the mutex lock lock = Lock() |
We will then create and start a child process that will call the task() function and the lock as an argument.
1 2 3 4 5 |
... # create and configure the new process process = Process(target=task, args=(lock,)) # start the new process process.start() |
The main process will then block for a moment to allow the child process to fail.
1 2 3 |
... # wait a while sleep(1) |
Finally, the main process will attempt to acquire the lock.
1 2 3 4 5 6 7 |
... # acquire the lock print('Main acquiring lock...') lock.acquire() # do something... # release lock (never gets here) lock.release() |
Tying this together, the complete example of a deadlock caused by a process failing to release a lock is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# SuperFastPython.com # example of a deadlock caused by a process failing to release a lock from time import sleep from multiprocessing import Process from multiprocessing import Lock # task to be executed in a new process def task(lock): # acquire the lock print('Process acquiring lock...', flush=True) lock.acquire() # fail raise Exception('Something bad happened') # release the lock (never gets here) print('Process releasing lock...', flush=True) lock.release() if __name__ == '__main__': # create the mutex lock lock = Lock() # create and configure the new process process = Process(target=task, args=(lock,)) # start the new process process.start() # wait a while sleep(1) # acquire the lock print('Main acquiring lock...') lock.acquire() # do something... # release lock (never gets here) lock.release() |
Running the example first creates the lock, then creates and starts the child process.
The main process then blocks.
The child process runs. It first acquires the lock, then fails by raising an exception. The process unwinds and never makes it to the line of code to release the lock. The child process terminates.
Finally, the main process wakes up, then attempts to acquire the lock. The main process blocks forever as the lock will not be released, resulting in a deadlock.
1 2 3 4 5 6 |
Process acquiring lock... Process Process-1: Traceback (most recent call last): ... Exception: Something bad happened Main acquiring lock... |
There are two aspects to avoiding this form of deadlock.
The first is to follow best practices when acquiring and releasing locks.
If the task() function acquired and released the lock using a context manager, then the release() function would have been called automatically when the exception was raised and propagated.
Secondly, the main process can use a timeout while waiting for the lock and give up after a few minutes and handle the failure case.
Next, let’s review a summary of useful tips for avoiding deadlocks.
Tips for Avoiding Deadlocks
The best approach for avoiding deadlocks is to try to follow best practices when using concurrency in your Python programs.
A few simple tips will take you a long way.
In this section, we will review some of these important best practices for avoiding deadlocks.
- Use context managers when acquiring and releasing locks.
- Use timeouts when waiting.
- Always acquire locks in the same order.
You can also learn more tips in the tutorial:
Tip 1: Use Context Managers
Acquire and release locks using a context manager, wherever possible.
Locks can be acquired manually via a call to acquire() at the beginning of the critical section followed by a call to release() at the end of the critical section.
For example:
1 2 3 4 5 6 |
... # acquire the lock manually lock.acquire() # critical section... # release the lock lock.release() |
This approach should be avoided wherever possible.
Traditionally, it was recommended to always acquire and release a lock in a try-finally structure.
The lock is acquired, the critical section is executed in the try block, and the lock is always released in the finally block.
For example:
1 2 3 4 5 6 7 8 |
... # acquire the lock lock.acquire() try: # critical section... finally: # always release the lock lock.release() |
This was since replaced with the context manager interface that achieves the same thing with less code.
For example:
1 2 3 4 |
... # acquire the lock with lock: # critical section... |
The benefit of the context manager is that the lock is always released as soon as the block is exited, regardless of how it is exited, e.g. normally, a return, an error, or an exception.
This applies to a number of concurrency primitives, such as:
- Acquiring a mutex lock via the multiprocessing.Lock class.
- Acquiring a reentrant mutex lock via the multiprocessing.RLock class.
- Acquiring a semaphore via the multiprocessing.Semaphore class.
- Acquiring a condition via the multiprocessing.Condition class.
Tip 2: Use Timeouts When Waiting
Always use a timeout when waiting on a blocking call.
Many calls made on concurrency primitives may block.
For example:
- Waiting to acquire a multiprocessing.Lock via acquire().
- Waiting for a process to terminate via join().
- Waiting to be notified on a multiprocessing.Condition via wait().
And more.
All blocking calls on concurrency primitives take a “timeout” argument and return True if the call was successful or False otherwise.
Do not call a blocking call without a timeout, wherever possible.
For example:
1 2 3 4 |
... # acquire the lock if not lock.acquire(timeout=2*60): # handle failure case... |
This will allow the waiting process to give up waiting after a fixed time limit and then attempt to rectify the situation, e.g. report an error, force termination, etc.
Tip 3: Acquire Locks in Order
Acquire locks in the same order throughout the application, wherever possible.
This is called “lock ordering”.
In some applications, you may be able to abstract the acquisition of locks using a list of multiprocessing.Lock objects that may be iterated and acquired in order, or a function call that acquires locks in a consistent order.
When this is not possible, you may need to audit your code to confirm that all paths through the code acquire the locks in the same order.
Further Reading
This section provides additional resources that you may find helpful.
Python Multiprocessing Books
- Python Multiprocessing Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Multiprocessing API Cheat Sheet
I would also recommend specific chapters in the books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know how to identify multiprocessing deadlocks in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Remy Lovesy on Unsplash
Do you have any questions?