You can identify coroutine deadlocks by seeing examples and developing an intuition for their common causes.
In most cases, deadlocks can be avoided by using best practices in concurrency programming, such as lock ordering, using timeouts on waits, and using context managers when acquiring locks.
In this tutorial, you will discover how to identify asyncio deadlocks in Python.
Let’s get started.
What is a Deadlock
A deadlock is a concurrency failure mode where a thread or threads wait for a condition that never occurs.
The result is that the deadlock threads are unable to progress and the program is stuck or frozen and must be terminated forcefully.
Deadlock: A situation in which one or more threads are waiting for an event that will never occur. The most common situation is when two threads are each holding a synchronization object that the other thread wants and there is no way for one thread to release the object it holds in order to allow the other thread to proceed.
— Glossary, The Art of Concurrency, 2009.
Although deadlocks are described in terms of threads, they may also occur with other units of concurrency, such as processes and coroutines.
There are many ways in which we may encounter a deadlock in our asyncio program.
Deadlocks are not developed intentionally, instead, they are an unexpected side effect or bug in concurrency programming.
Common examples of the cause of coroutine deadlocks include:
- A coroutine that waits on itself (e.g. attempts to acquire the same mutex lock twice).
- Coroutines that wait on each other (e.g. A waits on B, B waits on A).
- Coroutine that fails to release a resource (e.g. mutex lock, semaphore, barrier, condition, event, etc.).
- Coroutines that acquire mutex locks in different orders (e.g. fail to perform lock ordering).
Deadlocks may be easy to describe, but hard to detect in an application just from reading code.
It is important to develop an intuition for the causes of different deadlocks. This will help you identify deadlocks in your own code and trace down the causes of those deadlocks that you may encounter.
Now that we are familiar with what a deadlock is, let’s look at some worked examples in asyncio programs.
Run loops using all CPUs, download your FREE book to learn how.
Deadlock: Coroutine Waits On Itself
A common cause of a deadlock is a coroutine that waits on itself.
We do not intend for this deadlock to occur, e.g. we don’t intentionally write code that causes a coroutine to wait on itself. Instead, this occurs accidentally due to a series of function calls and variables being passed around.
A coroutine may wait on itself for many reasons, such as:
- Waiting to acquire a mutex lock that it has already acquired.
- Waiting to be notified on a condition by itself.
- Waiting for an event to be set by itself.
- Waiting for a semaphore to be released by itself.
- And so on.
We can demonstrate a deadlock caused by a coroutine waiting on itself.
In this case, we will develop a task() coroutine that directly attempts to acquire the same mutex lock twice. That is, the task will acquire the lock, then attempt to acquire the lock again.
This will cause a deadlock as the coroutine already holds the lock and will wait forever for itself to release the lock so that it can acquire it again.
If you are new to mutex locks in asyncio programs, see the tutorial:
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# SuperFastPython.com # example of a deadlock caused by a coroutine waiting on itself import asyncio # coroutine that acquires the lock twice async def task(lock): # report a message print('Task acquiring lock...') # acquire the lock async with lock: # report a second message print('Task acquiring lock again...') # acquire the lock again async with lock: # will never get here pass # main coroutine async def main(): # create the shared lock lock = asyncio.Lock() # execute and await the coroutine await task(lock) # entry point for the asyncio program asyncio.run(main()) |
Running the example first creates the main() coroutine that is executed as the entry point into the asyncio program.
The main() coroutine runs, first creating the lock, then creating and awaiting the task() coroutine, passing the lock as an argument.
The new coroutine runs and first reports a message and acquires the lock. It then reports a second message and attempts to acquire the same mutex lock again.
It will be suspended forever waiting for the lock to be released.
The lock cannot be released because the coroutine already holds the lock. Therefore the coroutine has deadlocked.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 |
Task acquiring lock... Task acquiring lock again... |
This second scenario is an example of the same deadlock and may be an easier trap to fall into.
In this scenario, you may have multiple coroutines that use the same lock to protect some program state. A deadlock arises if one coroutine that acquires the lock then awaits another coroutine that also uses the lock.
The complete updated example with the two task functions is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# SuperFastPython.com # example of a deadlock caused by a coroutine waiting on itself import asyncio # coroutine that acquires the lock async def task2(lock): print('Task2 acquiring lock again...') async with lock: # will never get here pass # coroutine that acquires the lock async def task1(lock): print('Task1 acquiring lock...') async with lock: await task2(lock) # main coroutine async def main(): # create the shared lock lock = asyncio.Lock() # execute and await the coroutine await task1(lock) # entry point for the asyncio program asyncio.run(main()) |
Running the example first creates the main() coroutine that is executed as the entry point into the asyncio program.
The main() coroutine runs, first creating the lock, then creating and awaiting the task1() coroutine, passing the lock as an argument.
The task1() coroutine acquires the lock and then awaits task2().
The task2() coroutine runs and attempts to acquire the same lock and the program is stuck in a deadlock waiting for the lock to be released by itself so it can acquire it again.
Again, the program must be terminated forcefully, e.g. killed via Control-C.
1 2 |
Task1 acquiring lock... Task2 acquiring lock again... |
Next, let’s look at a deadlock caused by coroutines waiting on each other.
Deadlock: Coroutines Wait On Each Other
Another common deadlock is to have two (or more) coroutines waiting on each other.
For example, Coroutine A is waiting on Coroutine B, and Coroutine B is waiting on Coroutine A.
- Coroutine A: Waiting on Coroutine B.
- Coroutine B: Waiting on Coroutine A.
Or with three coroutines, you could have a cycle of coroutines waiting on each other, for example:
- Coroutine A: Waiting on Coroutine B.
- Coroutine B: Waiting on Coroutine C.
- Coroutine C: Waiting on Coroutine A.
This deadlock is common if you set up coroutines to wait on the result from other coroutines, such as in a pipeline or workflow where some dependencies for subtasks are out of order.
A simple way to demonstrate this type of deadlock is to create a new coroutine that takes an instance of an asyncio.Task, simulates work then awaits on the passed in asyncio.Task.
We can then create a new asyncio.Task and have it call the function and pass it the asyncio.Task object for the main() coroutine. We can then have the main coroutine call the same coroutine and pass in an instance of the new asyncio.Task we created.
For example:
- New Coroutine: Waiting on Main Coroutine.
- Main Coroutine: Waiting on New Coroutine.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of a deadlock caused by coroutines waiting on each other import asyncio # takes a task and awaits it async def task(other): # report a message print(f'awaiting the task: {other.get_name()}') # wait for the task to finish await other # main coroutine async def main(): # get the current task task1 = asyncio.current_task() # create and execute the first task task2 = asyncio.create_task(task(task1)) # execute the task, which will await task2 await task(task2) # entry point for the asyncio program asyncio.run(main()) |
Running the example first creates the main() coroutine that is executed as the entry point into the asyncio program.
The main() coroutine runs and first gets the asyncio.Task instance for the main() coroutine.
It then creates a new asyncio.Task and creates a new task() coroutine, passing in the asyncio.Task for the main() coroutine.
Finally, it awaits the new asyncio.Task.
The new coroutine then runs, reports a message, and waits for the main coroutine to terminate.
We are now in a deadlock.
Each coroutine is waiting on the other to terminate before itself can terminate.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 |
awaiting the task: Task-2 awaiting the task: Task-1 |
This type of deadlock can happen if coroutines are awaiting another coroutine in a way that results in a cycle.
It may also happen for any type of wait operation, where coroutine dependencies create a cycle, such as waiting on a mutex lock, waiting on a semaphore, waiting to be notified on a condition, and so on.
The waiting may also be less obvious given the level of indirection. For example, a coroutine may be waiting in a queue that itself is populated by another coroutine. The other coroutine may in turn be waiting on the first coroutine directly.
Next, let’s look at an example of a deadlock caused by coroutine acquiring locks in different orders.
Free Python Asyncio Course
Download your FREE Asyncio PDF cheat sheet and get BONUS access to my free 7-day crash course on the Asyncio API.
Discover how to use the Python asyncio module including how to define, create, and run new coroutines and how to use non-blocking I/O.
Deadlock: Coroutines Acquiring Locks in the Wrong Order
A common cause of a deadlock is when two coroutines acquire locks in different orders at the same time.
For example, we may have a critical section protected by a lock and within that critical section, we may have code or a function call that is protected by a second lock.
We may have the situation where one coroutine acquires lock1, then attempts to acquire lock2, then has a second coroutine that calls functionality that acquires lock2, then attempts to acquire lock1. If this occurs concurrently where coroutine1 holds lock1 and coroutine2 holds lock2, then there will be a deadlock.
- Coroutine1: Holds Lock1, Waiting for Lock2.
- Coroutine2: Holds Lock2, Waiting for Lock1.
We can demonstrate this with a direct example.
We can create a task() coroutine that takes both locks as arguments and then attempts to acquire the first lock and then the second lock. Two coroutines can then be created with the locks as arguments, but perhaps we make a typo and have the first coroutine take lock1 then lock2 as arguments, and the second coroutine takes lock2 then lock1 as arguments.
The result will be a deadlock if each coroutine can first acquire a lock and then wait on the second lock.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# SuperFastPython.com # example of a deadlock caused by acquiring locks in a different order import asyncio # coroutine task operating the locks async def task(number, lock1, lock2): # acquire the first lock print(f'Task {number} acquiring lock 1...') async with lock1: # wait a moment await asyncio.sleep(1) # acquire the next lock print(f'Task {number} acquiring lock 2...') async with lock2: # never gets here.. pass # main coroutine async def main(): # create the locks lock1 = asyncio.Lock() lock2 = asyncio.Lock() # create the coroutines coro1 = task(1, lock1, lock2) coro2 = task(2, lock2, lock1) # execute and await both coroutines await asyncio.gather(coro1, coro2) # entry point for the asyncio program asyncio.run(main()) |
Running the example first creates the main() coroutine that is executed as the entry point into the asyncio program.
The main() coroutine runs and creates both locks. Then both coroutines are created and the main coroutine runs and waits for the coroutines to terminate.
The first coroutine receives lock1 and lock2 as arguments. It acquires lock1 and sleeps.
The second coroutine receives lock2 and lock1 as arguments. It acquires lock2 and sleeps.
The first coroutine wakes and tries to acquire lock2, but it must wait as it is already acquired by the second coroutine.
The second coroutine wakes and tries to acquire lock1, but it must wait as it is already acquired by the first coroutine.
The result is a deadlock.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 3 4 |
Task 1 acquiring lock 1... Task 2 acquiring lock 1... Task 1 acquiring lock 2... Task 2 acquiring lock 2... |
The solution is to ensure locks are always acquired in the same order throughout the program.
This is called lock ordering.
Next, let’s consider a deadlock by a coroutine failing to release a lock.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Deadlock: Failing to Release a Lock
Another common cause of a deadlock is by a coroutine failing to release a resource.
This is typically caused by a coroutine raising an error or exception in a critical section, a way that prevents the coroutine from releasing a resource.
Some examples include:
- Failing to release a lock.
- Failing to release a semaphore.
- Failing to arrive at a barrier.
- Failing to notify coroutines on a condition.
- Failing to set an event.
- And so on.
We can demonstrate this deadlock with an example.
A coroutine may acquire a lock in order to execute a critical section, then raise an exception that prevents the coroutine from releasing the lock. Another coroutine may then try to acquire the same lock and must wait forever as the lock will never be released, resulting in a deadlock.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# SuperFastPython.com # example of a deadlock caused by a coroutine failing to release a lock import asyncio # coroutine task that acquires a lock async def task(lock): # acquire the lock print('Task acquiring lock...') await lock.acquire() # fail raise Exception('Something bad happened') # release the lock (never gets here) print('Task releasing lock...') lock.release() # main coroutine async def main(): # create the mutex lock lock = asyncio.Lock() # create and start the new task asyncio.create_task(task(lock)) # wait a moment await asyncio.sleep(1) # acquire the lock print('Main acquiring lock...') await lock.acquire() # do something... # release lock (never gets here) lock.release() # entry point for the asyncio program asyncio.run(main()) |
Running the example first creates the main() coroutine that is executed as the entry point into the asyncio program.
The main() coroutine runs and creates the lock. It then creates the task coroutine and runs it in the background as a task.
The main() coroutine then sleeps, suspending and allowing the task() coroutine to run.
The new coroutine runs. It first acquires the lock, then fails by raising an exception. The coroutine unwinds and never makes it to the line of code to release the lock. The new coroutine terminates.
The main coroutine resumes, then attempts to acquire the lock. The main coroutine blocks forever as the lock will not be released, resulting in a deadlock.
The program must be terminated forcefully, e.g. killed via Control-C.
1 2 3 4 5 6 7 |
Task acquiring lock... Task exception was never retrieved future: <Task finished name='Task-2' coro=<task() done, defined at ...> exception=Exception('Something bad happened')> Traceback (most recent call last): ... Exception: Something bad happened Main acquiring lock... |
There are two aspects to avoiding this form of deadlock.
The first is to follow best practices when acquiring and releasing locks.
If the task() function acquired and released the lock using a context manager, then the release() function would have been called automatically when the exception was raised.
Secondly, the main coroutine can use a timeout while waiting for the lock and give up after a few minutes and handle the failure case.
Next, let’s review a summary of useful tips for avoiding deadlocks.
Tips for Avoiding Deadlocks
The best approach for avoiding deadlocks is to try to follow best practices when using concurrency in your Python programs.
A few simple tips will take you a long way.
In this section we will review some of these important best practices for avoiding deadlocks.
- Use context managers when acquiring and releasing locks.
- Use timeouts when waiting.
- Always acquire locks in the same order.
Tip 1: Use Context Managers
Acquire and release locks using a context manager, wherever possible.
Locks can be acquired manually via a call to acquire() at the beginning of the critical section followed by a call to release() at the end of the critical section.
For example:
1 2 3 4 5 6 |
... # acquire the lock manually lock.acquire() # critical section... # release the lock lock.release() |
This approach should be avoided wherever possible.
Traditionally, it was recommended to always acquire and release a lock in a try-finally structure.
The lock is acquired, the critical section is executed in the try block, and the lock is always released in the finally block.
For example:
1 2 3 4 5 6 7 8 |
... # acquire the lock lock.acquire() try: # critical section... finally: # always release the lock lock.release() |
This was since replaced with the context manager interface that achieves the same thing with less code.
For example:
1 2 3 4 |
... # acquire the lock with lock: # critical section... |
The benefit of the context manager is that the lock is always released as soon as the block is exited, regardless of how it is exited, e.g. normally, a return, an error, or an exception.
This applies to a number of concurrency primitives, such as:
- Acquiring a mutex lock via the asyncio.Lock class.
- Acquiring a semaphore via the asyncio.Semaphore class.
- Acquiring a condition via the asyncio.Condition class.
Tip 2: Use Timeouts When Waiting
Always use a timeout when waiting on a coroutine or task.
Many calls made on concurrency primitives may block.
For example:
- Waiting to acquire a asyncio.Lock via acquire().
- Waiting for a coroutine to terminate via the await expression.
- Waiting to be notified on an asyncio.Condition via wait().
And more.
Unlike threads and processes, blocking calls on asyncio concurrency primitives do not take a “timeout” argument.
methods of these synchronization primitives do not accept the timeout argument
— Asyncio Synchronization Primitives
Instead, we must use the asyncio.wait_for() function to await an awaitable (coroutine or task) with a timeout.
For example:
1 2 3 4 5 |
... try: await asyncio.wait_for(lock.acquire(), timeout=10) except asyncio.TimeoutError: # ... |
This will allow the waiting coroutine to give-up waiting after a fixed time limit and then attempt to rectify the situation, e.g. report an error, force termination, etc.
You can learn more about the asyncio.wait_for() function in the tutorial:
Tip 3: Acquire Locks in Order
Acquire locks in the same order throughout the application, wherever possible.
This is called “lock ordering”.
In some applications, you may be able to abstract the acquisition of locks using a list of asyncio.Lock objects that may be iterated and acquired in order, or a function call that acquires locks in a consistent order.
When this is not possible, you may need to audit your code to confirm that all paths through the code acquire the locks in the same order.
Further Reading
This section provides additional resources that you may find helpful.
Python Asyncio Books
- Python Asyncio Mastery, Jason Brownlee (my book!)
- Python Asyncio Jump-Start, Jason Brownlee.
- Python Asyncio Interview Questions, Jason Brownlee.
- Asyncio Module API Cheat Sheet
I also recommend the following books:
- Python Concurrency with asyncio, Matthew Fowler, 2022.
- Using Asyncio in Python, Caleb Hattingh, 2020.
- asyncio Recipes, Mohamed Mustapha Tahrioui, 2019.
Guides
APIs
- asyncio — Asynchronous I/O
- Asyncio Coroutines and Tasks
- Asyncio Streams
- Asyncio Subprocesses
- Asyncio Queues
- Asyncio Synchronization Primitives
References
Takeaways
You now know how to identify deadlocks with coroutines in asyncio programs.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Peter Fogden on Unsplash
Do you have any questions?