Last Updated on November 24, 2022
You can identify thread deadlocks by seeing examples and developing an intuition for their common causes.
In most cases, deadlocks can be avoided by using best practices in concurrency programming, such as lock order, using time outs on waits, and using context managers when acquiring locks.
In this tutorial, you will discover how to identify deadlocks in Python.
Let’s get started.
What is a Deadlock
A deadlock is a concurrency failure mode where a thread or threads wait for a condition that never occurs.
The result is that the deadlock threads are unable to progress and the program is stuck or frozen and must be terminated forcefully.
There are many ways in which you may encounter a deadlock in your concurrent program.
Deadlocks are not developed intentionally, instead, they are an unexpected side effect or bug in concurrency programming.
Common examples of the cause of threading deadlocks include:
- A thread that waits on itself (e.g. attempts to acquire the same mutex lock twice).
- Threads that wait on each other (e.g. A waits on B, B waits on A).
- Thread that fails to release a resource (e.g. mutex lock, semaphore, barrier, condition, event, etc.).
- Threads that acquire mutex locks in different orders (e.g. fail to perform lock ordering).
Deadlocks may be easy to describe, but hard to detect in an application just from reading code.
It is important to develop an intuition for the causes of different deadlocks. This will help you identify deadlocks in your own code and trace down the causes of those deadlocks that you may encounter.
Now that we are familiar with what a deadlock is, let’s look at some worked examples.
Run loops using all CPUs, download your FREE book to learn how.
Deadlock 1: Thread Waits on Itself
A common cause of a deadlock is a thread that waits on itself.
We do not intend for this deadlock to occur, e.g. we don’t intentionally write code that causes a thread to wait on itself. Instead, this occurs accidentally due to a series of function calls and variables being passed around.
A thread may wait on itself for many reasons, such as:
- Waiting to acquire a mutex lock that it has already acquired.
- Waiting to be notified on a condition by itself.
- Waiting for an event to be set by itself.
- Waiting for a semaphore to be released by itself.
And so on.
One example that cannot occur is that a thread cannot explicitly wait for itself to terminate with a call to join(). This situation is detected and a RuntimeError is raised.
We can demonstrate a deadlock caused by a thread waiting on itself.
In this case we will develop a task() function that directly attempts to acquire the same mutex lock twice. That is, the task will acquire the lock, then attempt to acquire the lock again.
This will cause a deadlock as the thread already holds the lock and will wait forever for itself to release the lock so that it can acquire it again.
The task() function that attempts to acquire the same lock twice and trigger a deadlock is listed below.
1 2 3 4 5 6 7 8 |
# task to be executed in a new thread def task(lock): print('Thread acquiring lock...') with lock: print('Thread acquiring lock again...') with lock: # will never get here pass |
In the main thread, we can then create the lock.
1 2 3 |
... # create the mutex lock lock = Lock() |
We will then create and configure a new thread to execute our task() function in a new thread, then start the thread and wait for it to terminate, which it never will.
1 2 3 4 5 6 7 |
... # create and configure the new thread thread = Thread(target=task, args=(lock,)) # start the new thread thread.start() # wait for threads to exit... thread.join() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of a deadlock caused by a thread waiting on itself from threading import Thread from threading import Lock # task to be executed in a new thread def task(lock): print('Thread acquiring lock...') with lock: print('Thread acquiring lock again...') with lock: # will never get here pass # create the mutex lock lock = Lock() # create and configure the new thread thread = Thread(target=task, args=(lock,)) # start the new thread thread.start() # wait for threads to exit... thread.join() |
Running the example first creates the lock.
The new thread is then configured and started and the main thread blocks until the new thread terminates, which it never does.
The new thread runs and first acquires the lock. It then attempts to acquire the same mutex lock again and blocks.
It will block forever waiting for the lock to be released. The lock cannot be released because the thread already holds the lock. Therefore the thread has deadlocked.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 |
Thread acquiring lock... Thread acquiring lock again... |
Perhaps the above example is too contrived.
Attempting the same lock is common if you protect a critical section with a lock and within that critical section you call another function that attempts to acquire the same lock.
For example, we can update the previous example to split the task() function into two functions task1() and task2(). The task1() function acquires the lock, does some work, then calls task2() that does some work and attempts to acquire the lock again.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 |
# task2 to be executed in a new thread def task2(lock): print('Thread acquiring lock again...') with lock: # will never get here pass # task1 to be executed in a new thread def task1(lock): print('Thread acquiring lock...') with lock: task2(lock) |
This is a more realistic scenario and is easy to fall into.
For example, you may have a custom class that has a lock as a member variable to protect state within the object. Each method called on your object uses the lock to protect state, and some functions call other functions internally to reuse code.
Nevertheless, the complete updated example with the two task functions is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# SuperFastPython.com # example of a deadlock caused by a thread waiting on itself from threading import Thread from threading import Lock # task2 to be executed in a new thread def task2(lock): print('Thread acquiring lock again...') with lock: # will never get here pass # task1 to be executed in a new thread def task1(lock): print('Thread acquiring lock...') with lock: task2(lock) # create the mutex lock lock = Lock() # create and configure the new thread thread = Thread(target=task1, args=(lock,)) # start the new thread thread.start() # wait for threads to exit... thread.join() |
Running the example creates the lock and then creates and starts the new thread.
The thread acquires the lock in task1(), simulates some work then calls task2(). The task2() function attempts to acquire the same lock and the thread is stuck in a deadlock waiting for the lock to be released by itself so it can acquire it again.
1 2 |
Thread acquiring lock... Thread acquiring lock again... |
This specific deadlock with a mutex lock can be avoided by using a reentrant mutex lock. This allows a thread to acquire the same lock more than once.
A reentrant lock is recommended any time you may have code that acquires a lock that may call other code that may acquire the same lock.
Next, let’s look at a deadlock caused by threads waiting on each other.
Deadlock 2: Threads Waiting on Each Other
Another common deadlock is to have two (or more) threads waiting on each other.
For example Thread A is waiting on Thread B, and Thread B is waiting on Thread A.
- Thread A: Waiting on Thread B.
- Thread B: Waiting on Thread A.
Or with three threads, you could have a cycle of threads waiting on each other, for example:
- Thread A: Waiting on Thread B.
- Thread B: Waiting on Thread C.
- Thread C: Waiting on Thread A.
This deadlock is common if you set up threads to wait on the result from other threads, such as in a pipeline or workflow where some dependencies for subtasks are out of order.
A simple way to demonstrate this type of deadlock is to create a new function that takes an instance of a threading.Thread, simulates work then waits for the passed in thread to terminate. We can then create a new thread and have it call the function and pass it the main thread. We can then have the main thread call the same function and pass in an instance of the new thread.
For example:
- New Thread: Waiting on Main Thread.
- Main Thread: Waiting on New Thread.
We can define the task() function to take an instance of a thread, then report the name of the thread that is running and the name of the thread that it is waiting on.
1 2 3 4 5 |
# task to be executed in a new thread def task(other): # message print(f'[{current_thread().name}] waiting on [{other.name}]...') other.join() |
We can then create a new thread instance and configure it to call the task() function with an instance of the main thread.
1 2 3 4 5 6 7 |
... # get the current thread main_thread = current_thread() # create the second thread new_thread = Thread(target=task, args=(main_thread,)) # start the new thread new_thread.start() |
Finally, we can have the main thread call the same function and pass an instance of the new thread.
1 2 3 |
... # run the first thread task(new_thread) |
Tying this together, the complete example of two threads waiting on each other is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# SuperFastPython.com # example of a deadlock caused by threads waiting on each other from threading import current_thread from threading import Thread # task to be executed in a new thread def task(other): # message print(f'[{current_thread().name}] waiting on [{other.name}]...') other.join() # get the current thread main_thread = current_thread() # create the second thread new_thread = Thread(target=task, args=(main_thread,)) # start the new thread new_thread.start() # run the first thread task(new_thread) |
Running the example first gets the threading.Thread instance for the main thread, then creates a new thread and calls the task() function passing in the main thread.
The new thread reports a message and waits for the main thread to terminate.
The main thread then calls the task() function with the instance of the new thread and waits for the new thread to terminate.
Each thread is waiting on the other to terminate before itself can terminate, resulting in a deadlock.
1 2 |
[Thread-1] waiting on [MainThread]... [MainThread] waiting on [Thread-1]... |
This type of deadlock can happen if threads are waiting on another thread to terminate in a way that results in a cycle.
It may also happen for any type of wait operation, where threads dependencies create a cycle, such as waiting on a mutex lock, waiting on a semaphore, waiting to be notified on a condition, and so on.
The waiting may also be less obvious given a level of indirection. For example, a thread may be waiting in a queue that itself is populated by another thread. The other thread may in turn be waiting on the first thread directly.
Next, let’s look at an example of a deadlock caused by threads acquiring locks in different orders.
Free Python Threading Course
Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.
Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores
Deadlock 3: Acquiring Locks in the Wrong Order
A common cause of a deadlock is when two threads acquire locks in different orders at the same time.
For example, we may have a critical section protected by a lock and within that critical section we may have code or a function call that is protected by a second lock.
We may have the situation where one thread acquires lock1, then attempts to acquire lock2, then has a second thread that calls functionality that acquires lock2, then attempts to acquire lock1. If this occurs concurrently where thread1 holds lock1 and thread2 holds lock2, then there will be a deadlock.
- Thread1: Holds Lock1, Waiting for Lock2.
- Thread2: Holds Lock2, Waiting for Lock1.
We can demonstrate this with a direct example.
We can create a task() function that takes both locks as arguments then attempts to acquire the first lock then the second lock. Two threads can then be created to call this function with the locks as arguments, but perhaps we make a typo and have the first thread take lock1 then lock2 as arguments, and the second thread take lock2 then lock1 as arguments. The result will be a deadlock, if each thread can first acquire a lock and then wait on the second lock.
First, let’s define the task() function that takes the two locks as arguments and acquires them one after the other.
1 2 3 4 5 6 7 8 9 10 11 12 |
# task to be executed in a new thread def task(number, lock1, lock2): # acquire the first lock print(f'Thread {number} acquiring lock 1...') with lock1: # wait a moment sleep(1) # acquire the next lock print(f'Thread {number} acquiring lock 2...') with lock2: # never gets here.. pass |
Notice that we have added a sleep for one second after acquiring the first lock.
This ensures that the race condition occurs, giving enough time for each thread to acquire its first lock before attempting to acquire the second lock.
Back in the main thread, we can then create two separate mutex locks.
1 2 3 4 |
... # create the mutex locks lock1 = Lock() lock2 = Lock() |
Next, we can create and configure two new threads to call the task() function and transpose the lock arguments for one of them.
1 2 3 4 |
... # create and configure the new threads thread1 = Thread(target=task, args=(1, lock1, lock2)) thread2 = Thread(target=task, args=(2, lock2, lock1)) |
Finally, we can start the threads and wait in the main thread for both threads to terminate.
1 2 3 4 5 6 7 |
... # start the new threads thread1.start() thread2.start() # wait for threads to exit... thread1.join() thread2.join() |
Tying this together, the complete example of a deadlock caused by acquiring locks in the wrong order is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# SuperFastPython.com # example of a deadlock caused by acquiring locks in a different order from time import sleep from threading import Thread from threading import Lock # task to be executed in a new thread def task(number, lock1, lock2): # acquire the first lock print(f'Thread {number} acquiring lock 1...') with lock1: # wait a moment sleep(1) # acquire the next lock print(f'Thread {number} acquiring lock 2...') with lock2: # never gets here.. pass # create the mutex locks lock1 = Lock() lock2 = Lock() # create and configure the new threads thread1 = Thread(target=task, args=(1, lock1, lock2)) thread2 = Thread(target=task, args=(2, lock2, lock1)) # start the new threads thread1.start() thread2.start() # wait for threads to exit... thread1.join() thread2.join() |
Running the example first creates both locks. Then both threads are created and the main thread waits for the threads to terminate.
The first thread receives lock1 and lock2 as arguments. It acquires lock1 and sleeps.
The second threads receives lock2 and lock1 as arguments. It acquires lock2 and sleeps.
The first thread wakes and tries to acquire lock2, but it must wait as it is already acquired by the second thread. The second thread wakes and tries to acquire lock1, but it must wait as it is already acquired by the first thread.
The result is a deadlock.
1 2 3 4 |
Thread 1 acquiring lock 1... Thread 2 acquiring lock 1... Thread 1 acquiring lock 2... Thread 2 acquiring lock 2... |
The solution is to ensure locks are always acquired in the same order throughout the program.
This is called lock ordering.
Next, let’s consider a deadlock by a thread failing to release a lock.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Deadlock 4: Failing to Release a Lock
Another common cause of a deadlock is by a thread failing to release a resource.
This is typically caused by a thread raising an error or exception in a critical section, a way that prevents the thread from releasing a resource.
Some examples include:
- Failing to release a lock.
- Failing to release a semaphore.
- Failing to arrive at a barrier.
- Failing to notify threads on a condition.
- Failing to set an event.
And so on.
We can demonstrate this deadlock with an example. A thread may acquire a lock in order to execute a critical section, then raise an exception which prevents the thread from releasing the lock. Another thread may then try to acquire the same lock and must wait forever as the lock will never be released, resulting in a deadlock.
First, we can define a task() function that takes the lock as an argument, acquires it manually via the acquire() function, then raises an exception preventing the lock from being released.
1 2 3 4 5 6 7 8 9 10 |
# task to be executed in a new thread def task(lock): # acquire the lock print('Thread acquiring lock...') lock.acquire() # fail raise Exception('Something bad happened') # release the lock (never gets here) print('Thread releasing lock...') lock.release() |
Next, in the main thread, we can create the lock that will be shared between threads.
1 2 3 |
... # create the mutex lock lock = Lock() |
We will then create and start a new thread that will call the task() function and the lock as an argument.
1 2 3 4 5 |
... # create and configure the new thread thread = Thread(target=task, args=(lock,)) # start the new thread thread.start() |
The main thread will then block for a moment to allow the new thread to fail.
1 2 3 |
... # wait a while sleep(1) |
Finally, the main thread will attempt to acquire the lock.
1 2 3 4 5 6 7 |
... # acquire the lock print('Main acquiring lock...') lock.acquire() # do something... # release lock (never gets here) lock.release() |
Tying this together, the complete example of a deadlock caused by a thread failing to release a lock is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# SuperFastPython.com # example of a deadlock caused by a thread failing to release a lock from time import sleep from threading import Thread from threading import Lock # task to be executed in a new thread def task(lock): # acquire the lock print('Thread acquiring lock...') lock.acquire() # fail raise Exception('Something bad happened') # release the lock (never gets here) print('Thread releasing lock...') lock.release() # create the mutex lock lock = Lock() # create and configure the new thread thread = Thread(target=task, args=(lock,)) # start the new thread thread.start() # wait a while sleep(1) # acquire the lock print('Main acquiring lock...') lock.acquire() # do something... # release lock (never gets here) lock.release() |
Running the example first creates the lock, then creates and starts the new thread.
The main thread then blocks.
The new thread runs. It first acquires the lock, then fails by raising an exception. The thread unwinds and never makes it to the line of code to release the lock. The new thread terminates.
Finally, the main thread wakes up, then attempts to acquire the lock. The main thread blocks forever as the lock will not be released, resulting in a deadlock.
1 2 3 4 5 6 |
Thread acquiring lock... Exception in thread Thread-1: Traceback (most recent call last): ... Exception: Something bad happened Main acquiring lock... |
There are two aspects to avoiding this form of deadlock.
The first is to follow best practices when acquiring and releasing locks.
If the task() function acquired and released the lock using a context manager, then the release() function would have been called automatically when the exception was raised.
Secondly, the main thread can use a timeout while waiting for the lock and give up after a few minutes and handle the failure case.
Next, let’s review a summary of useful tips for avoiding deadlocks.
Tips for Avoiding Deadlocks
The best approach for avoiding deadlocks is to try to follow best practices when using concurrency in your Python programs.
A few simple tips will take you a long way.
In this section we will review some of these important best practices for avoiding deadlocks.
- Use context managers when acquiring and releasing locks.
- Use timeouts when waiting.
- Always acquire locks in the same order.
Tip 1: Use Context Managers
Acquire and release locks using a context manager, wherever possible.
Locks can be acquired manually via a call to acquire() at the beginning of the critical section followed by a call to release() at the end of the critical section.
For example:
1 2 3 4 5 6 |
... # acquire the lock manually lock.acquire() # critical section... # release the lock lock.release() |
This approach should be avoided wherever possible.
Traditionally, it was recommended to always acquire and release a lock in a try-finally structure.
The lock is acquired, the critical section is executed in the try block, and the lock is always released in the finally block.
For example:
1 2 3 4 5 6 7 8 |
... # acquire the lock lock.acquire() try: # critical section... finally: # always release the lock lock.release() |
This was since replaced with the context manager interface that achieves the same thing with less code.
For example:
1 2 3 4 |
... # acquire the lock with lock: # critical section... |
The benefit of the context manager is that the lock is always released as soon as the block is exited, regardless of how it is exited, e.g. normally, a return, an error, or an exception.
This applies to a number of concurrency primitives, such as:
- Acquiring a mutex lock via the threading.Lock class.
- Acquiring a reentrant mutex lock via the threading.RLock class.
- Acquiring a semaphore via the threading.Semaphore class.
- Acquiring a condition via the threading.Condition class.
Tip 2: Use Timeouts When Waiting
Always use a timeout when waiting on a blocking call.
Many calls made on concurrency primitives may block.
For example:
- Waiting to acquire a threading.Lock via acquire().
- Waiting for a thread to terminate via join().
- Waiting to be notified on a threading.Condition via wait().
And more.
All blocking calls on concurrency primitives take a “timeout” argument and return True if the call was successful or False otherwise.
Do not call a blocking call without a timeout, wherever possible.
For example:
1 2 3 4 |
... # acquire the lock if not lock.acquire(timeout=2*60): # handle failure case... |
This will allow the waiting thread to give-up waiting after a fixed time limit and then attempt to rectify the situation, e.g. report an error, force termination, etc.
Tip 3: Acquire Locks in Order
Acquire locks in the same order throughout the application, wherever possible.
This is called “lock ordering”.
In some applications you may be able to abstract the acquisition of locks using a list of threading.Lock objects that may be iterated and acquired in order, or a function call that acquires locks in a consistent order.
When this is not possible, you may need to audit your code to confirm that all paths through the code acquire the locks in the same order.
Further Reading
This section provides additional resources that you may find helpful.
Python Threading Books
- Python Threading Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- Threading Module API Cheat Sheet
I also recommend specific chapters in the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Threading: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to identify a threading deadlock in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Eric Michael on Unsplash
Do you have any questions?