You can find all stuck long-running tasks in asyncio by manually tracking how long each task has been alive and reporting task details if a threshold “too long” time is exceeded.
This approach can be used to find all stuck, hanging, and zombie asyncio tasks in the event loop, as well as those tasks that are blocked, but not yet not done.
In this tutorial, you will discover how to detect and report stuck long-running asyncio tasks.
Let’s get started.
Need to Find All Stuck and Long-Running Asyncio Tasks
Tasks can take too long in asyncio.
A coroutine may get stuck, waiting forever for some condition that will never occur.
- Maybe a resource is broken, or a connection is hanging.
- Maybe there is a race condition.
- Maybe there is a bug in a third-party library.
The Python asyncio module does provide some visibility of tasks that take too long.
This can be achieved by running the event loop in debug mode.
The problem is:
- Running the event loop in debug mode is slower, a lot slower on large systems with many tasks.
- Long-running tasks are only logged after they complete.
- “Long-running” means blocking the event loop, not a zombie still hanging around in the event loop.
You can learn more about running the asyncio event loop in debug mode in the tutorial:
Therefore, we need ways of introspecting asyncio tasks that have been around for too long but have not stopped or been canceled.
The API does not provide this ability (and I cannot find a good third-party library that does this, yet), therefore we must develop something ourselves.
Run loops using all CPUs, download your FREE book to learn how.
How to Discover Long-Running Tasks
We can explore a series of examples to approach the problem of finding and reporting asyncio tasks that have been running for too long.
Our approaches fall short of digging into the internals of event loops, we will stick with the simple task introspection tools provided by the high-level asyncio API.
We can stagger this effort into three levels, they are:
- Report the details for all running tasks (e.g. names, stack traces, etc.)
- Explicitly record the start time for each task and monitor tasks that have been running too long.
- Monitor all active tasks and report on those running too long.
Let’s take a closer look at each step in turn.
Report Details of All Running Tasks
The first step is to introspect all running tasks.
We can get a list of all tasks currently running in the event loop by calling asyncio.all_tasks() and enumerating the result.
For example:
1 2 3 4 |
... # get all tasks for task in asyncio.all_tasks(): # ... |
You can learn more about this function in the tutorial:
For each task, we can report some information.
We could print or log each task, which gives some useful information, such as:
- The task name
- The coroutine used to create it, the script, and the line number
- The status of the task
- …and more.
For example:
1 2 3 4 5 |
... # get all tasks for task in asyncio.all_tasks(): # log the task logging.debug(task) |
Might give something like:
1 2 3 |
DEBUG:root:<Task pending name='Task-2' coro=<work() running at /...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> DEBUG:root:<Task pending name='Task-4' coro=<work() running at ...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> DEBUG:root:<Task pending name='Task-3' coro=<work() running at ...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> |
Another approach is to report the full stack trace of each task.
This can give an idea of where each task is currently at, such as the line on which it is stuck.
We can achieve this via the Task.print_stack() method.
For example:
1 2 3 4 5 |
... # get all tasks for task in asyncio.all_tasks(): # report the trace task.print_stack() |
We can also send the trace to the log by first capturing it in an io.StringIO buffer, then writing the buffer to the logging infrastructure.
For example:
1 2 3 4 5 6 7 8 9 10 11 |
... # get all tasks for task in asyncio.all_tasks(): # create a string buffer buffer = io.StringIO("") # print the task stack task.print_stack(file=buffer) # log the trace logging.debug(buffer.getvalue()) # close the string buffer buffer.close() |
We can tie all of this together into a helper function that can exclude some tasks from the list (like the main task) and can be called whenever we want to dump the details of all tasks on demand, such as periodically every few minutes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# helper to log details of all tasks def log_all_tasks_helper(excluded=list()): # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in excluded: continue # create a string buffer buffer = io.StringIO("") # print the task stack task.print_stack(file=buffer) # log the trace logging.debug(buffer.getvalue()) # close the string buffer buffer.close() |
Next, let’s look at monitoring how long all tasks have been running and only reporting the details that have been running too long.
Record When All Tasks Are Started And Monitor
Another approach is to explicitly record the start time for each task executed in the asyncio program.
This is a big task but might be possible for small and moderately sized programs.
It requires defining a helper function used to create and issue all tasks. It uses a dictionary global variable to store the task name and start time (using some reliable clock like the event loop time or similar).
For example
1 2 3 4 5 6 7 8 9 |
# helper function for starting tasks and recording their start time # uses default task names, assumes they are unique (reasonable) def start_task(coroutine): # create the task task = asyncio.create_task(coroutine) # store the record TASK_DICT[task.get_name()] = time.monotonic() # return the created task return task |
This assumes that each task has a unique name, which is probably true if you’re using default task names.
The TASK_DICT global variable must be declared and defined somewhere early in your program.
1 2 3 |
... # set of all tasks we've started TASK_DICT = dict() |
We can then define a monitor coroutine.
This coroutine runs forever in a loop. Each iteration it checks all coroutines that are running checks how long they have been running, then logs the details of those that have been alive for too long.
Any task that is running and is not known to the TASK_DICT dict, is ignored. We can also have an exclusion list for tasks we know run a long time, like the main() coroutine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# report duration of all current tasks async def monitor(task_limit_sec, exclude=[]): # access global state global TASK_DICT # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # skip tasks not known to the mechanism if name not in TASK_DICT: continue # compute duration for current task duration = time.monotonic() - TASK_DICT[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) |
In this case, we report the task name.
This could just as easily be the string representation of the entire task or the entire stack trace for the task, as in the previous section.
The monitor loop also runs every second, but this could be every minute or every 10 minutes for long-running asyncio programs.
We can then kick off this monitor coroutine at the beginning of our program.
1 2 3 4 5 |
... # create the monitor task monitor_coro = monitor(3.0, [asyncio.current_task()]) # start the monitor task monitor_task = asyncio.create_task(monitor_coro) |
We must also explicitly cancel it when shutting down our program.
1 2 3 |
... # shutdown the monitor monitor_task.cancel() |
The obvious limitation of this approach is the need to centralize the creation of all asyncio tasks via the helper function.
Next, let’s look at removing this limitation.
Monitor All Active Tasks
Extending the above, the monitor alone can keep track of all active tasks over time.
This removes the need for explicitly recording the start time of tasks via the helper function.
Instead, the monitor records the first time a task is seen, then reports task details for those tasks that have been running too long.
For example, the main loop of the monitor() coroutine is updated to add the task to the dict if not seen before, rather than skipping over it.
1 2 3 4 5 6 |
... # check if not previously known if name not in task_dict: # add start time (first time seen) task_dict[name] = time.monotonic() continue |
The updated monitor with this change is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# report duration of all current tasks # assumes task names are unique async def monitor(task_limit_sec, exclude=[]): # record of all task names and their start times task_dict = dict() # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # check if not previously known if name not in task_dict: # add start time (first time seen) task_dict[name] = time.monotonic() continue # compute duration for current task duration = time.monotonic() - task_dict[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) |
As above, the monitor can be started when the asyncio program is first started, excluding the main coroutine. Then canceled right before exiting.
1 2 3 4 5 6 7 8 9 |
... # create the monitor task monitor_coro = monitor(3.0, [asyncio.current_task()]) # start the monitor task monitor_task = asyncio.create_task(monitor_coro) # the program ... # shutdown the monitor monitor_task.cancel() |
You could put this in a context manager or a simple try-finally block.
Now that we know how to find stuck and zombie asyncio tasks that have been alive for a long time, let’s look at some worked examples.
Example of Logging a Trace For All Current Tasks
We can explore an example that reports a trace for all currently active asyncio tasks.
In this example, we will define a simple coroutine worker task that takes an argument, sleeps for 5 seconds, then returns a result.
1 2 3 4 5 6 |
# long running task async def work(value): # sleep to simulate waiting await asyncio.sleep(5) # return value return value * 100 |
We can then create a TaskGroup and issue 3 of these tasks for execution and give them a few seconds to execute.
1 2 3 4 5 6 7 8 |
# asyncio entry point async def main(): # create a task group async with asyncio.TaskGroup() as group: # issue many tasks _ = [group.create_task(work(i)) for i in range(3)] # let the tasks start await asyncio.sleep(3) |
If you are new to asyncio.TaskGroup, see the tutorial:
Then, while the tasks are still running, we can call our log_all_tasks_helper() function defined above. The current task main() will be excluded from the report.
1 2 3 |
... # log status of all tasks, exclude self log_all_tasks_helper([asyncio.current_task()]) |
This will report stack information for all 3 running work() coroutines.
The log_all_tasks_helper() is a function that does some blocking IO, e.g. printing to standard out. This could be performed in a new thread (e.g. to_thread()) if it is expected to spend a long time printing.
We can then configure simple logging and start the event loop.
1 2 3 4 5 |
... # enable debug logging logging.basicConfig(level=logging.DEBUG) # start the event loop asyncio.run(main()) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# SuperFastPython.com # example of reporting stack of all running coroutines and tasks import logging import io import asyncio # helper to log details of all tasks def log_all_tasks_helper(excluded=list()): # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in excluded: continue # create a string buffer buffer = io.StringIO("") # print the task stack task.print_stack(file=buffer) # log the trace logging.debug(buffer.getvalue()) # close the string buffer buffer.close() # long running task async def work(value): # sleep to simulate waiting await asyncio.sleep(5) # return value return value * 100 # asyncio entry point async def main(): # create a task group async with asyncio.TaskGroup() as group: # issue many tasks _ = [group.create_task(work(i)) for i in range(3)] # let the tasks start await asyncio.sleep(3) # log status of all tasks, exclude self log_all_tasks_helper([asyncio.current_task()]) # wait for tasks to complete... print('All done') # enable debug logging logging.basicConfig(level=logging.DEBUG) # start the event loop asyncio.run(main()) |
Running the example first issues the three work() coroutines for execution and then suspends for 3 seconds.
The work() coroutines begin executing and suspend for 5 seconds.
The main() coroutine resumes and then calls our custom log_all_tasks_helper() function.
This reports stack information for all 3 currently running tasks and the line on which they are currently suspended.
This highlights how we can introspect the state of all running tasks in our asyncio programs.
1 2 3 4 5 6 7 8 9 10 11 12 |
DEBUG:asyncio:Using selector: KqueueSelector DEBUG:root:Stack for <Task pending name='Task-2' coro=<work() running at ...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> (most recent call last): File "...", line 26, in work await asyncio.sleep(5) DEBUG:root:Stack for <Task pending name='Task-4' coro=<work() running at ...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> (most recent call last): File "...", line 26, in work await asyncio.sleep(5) DEBUG:root:Stack for <Task pending name='Task-3' coro=<work() running at ...> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[TaskGroup._on_task_done()]> (most recent call last): File "...", line 26, in work await asyncio.sleep(5) |
Next, let’s explore an example of explicitly recording the start time of all tasks.
Free Python Asyncio Course
Download your FREE Asyncio PDF cheat sheet and get BONUS access to my free 7-day crash course on the Asyncio API.
Discover how to use the Python asyncio module including how to define, create, and run new coroutines and how to use non-blocking I/O.
Example of Recording Start Time for All Tasks
We can explore an example of explicitly recording the start time of all asyncio tasks in a program.
This may be required in cases where we cannot trust the monitor to determine how long tasks have been running (in the next section), and the program is small enough that we can centralize the creation and scheduling of all tasks in the program via our start_task() helper function, developed above.
1 2 3 4 5 6 7 8 9 |
# helper function for starting tasks and recording their start time # uses default task names, assumes they are unique (reasonable) def start_task(coroutine): # create the task task = asyncio.create_task(coroutine) # store the record TASK_DICT[task.get_name()] = time.monotonic() # return the created task return task |
The monitor() coroutine, also developed above, will run for the life of the program and report the details of those tasks that have been alive for “too long”, in this case, 3 seconds or more.
We will just report the task name and time alive, but this could be updated to report all task details or even the task stack trace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# report duration of all current tasks async def monitor(task_limit_sec, exclude=[]): # access global state global TASK_DICT # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # skip tasks not known to the mechanism if name not in TASK_DICT: continue # compute duration for current task duration = time.monotonic() - TASK_DICT[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) |
Firstly, we will define a do_work() coroutine that will sleep for a random number of seconds between 1 and 10.
This will give some variety in results each time we run the example. Some tasks will take a long time, some won’t.
1 2 3 4 5 6 7 8 |
# simple worker coroutine async def do_work(): # generate value in [1, 10] value = 1 + (random.random() * 10) # block await asyncio.sleep(value) # report a message print(f'Task with {value} is done.') |
Next, in the main() coroutine, we will create and start the monitor coroutine, then issue 10 of our do_work() coroutines.
Importantly, the work tasks are created using our custom start_task() helper function.
The main() coroutine will then suspend and wait for all tasks to complete.
Once all tasks are done, the monitor will be explicitly canceled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# asyncio entry point async def main(): # create the monitor task monitor_coro = monitor(3.0, [asyncio.current_task()]) # start the monitor task monitor_task = asyncio.create_task(monitor_coro) # create a bunch of worker tasks workers = [start_task(do_work()) for _ in range(10)] # wait for all tasks _ = await asyncio.gather(*workers) # report final message print('All done') # shutdown the monitor monitor_task.cancel() |
Finally, logging is enabled and the event loop is started.
The monitor() coroutines require a global variable to keep track of task start times. This can be declared and defined before starting the event loop.
1 2 3 4 5 6 7 |
... # enable debug logging logging.basicConfig(level=logging.DEBUG) # set of all tasks we've started TASK_DICT = dict() # start the event loop asyncio.run(main()) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# SuperFastPython.com # example of recording start time for all tasks import time import logging import random import asyncio # helper function for starting tasks and recording their start time # uses default task names, assumes they are unique (reasonable) def start_task(coroutine): # create the task task = asyncio.create_task(coroutine) # store the record TASK_DICT[task.get_name()] = time.monotonic() # return the created task return task # report duration of all current tasks async def monitor(task_limit_sec, exclude=[]): # access global state global TASK_DICT # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # skip tasks not known to the mechanism if name not in TASK_DICT: continue # compute duration for current task duration = time.monotonic() - TASK_DICT[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) # simple worker coroutine async def do_work(): # generate value in [1, 10] value = 1 + (random.random() * 10) # block await asyncio.sleep(value) # report a message print(f'Task with {value} is done.') # asyncio entry point async def main(): # create the monitor task monitor_coro = monitor(3.0, [asyncio.current_task()]) # start the monitor task monitor_task = asyncio.create_task(monitor_coro) # create a bunch of worker tasks workers = [start_task(do_work()) for _ in range(10)] # wait for all tasks _ = await asyncio.gather(*workers) # report final message print('All done') # shutdown the monitor monitor_task.cancel() # enable debug logging logging.basicConfig(level=logging.DEBUG) # set of all tasks we've started TASK_DICT = dict() # start the event loop asyncio.run(main()) |
Running the example first starts the monitor() coroutine, configured to report any tasks that take longer than 3 seconds, and ignores the main() coroutine.
Next, all 3 worker tasks are created and scheduled using our helper function. This records the start time of each task.
The main() coroutine then suspends waiting for the tasks to complete.
The tasks execute, each sleeping for a random number of seconds.
The monitor() coroutine runs every second and checks how long each task has been alive. As soon as some tasks have been seen to be alive for “too long”, debug log messages are reported.
This continues until all tasks are completed.
We can see by the end only a handful of tasks have been alive for a long time, e.g. 10 seconds.
Before exiting, the main() coroutine cancels the monitor task.
This highlights how we can explicitly record task start times, then monitor all known tasks and report debug messages when they have been alive for too long.
This approach can be used to aromatically discover long-running tasks in an asyncio program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
DEBUG:asyncio:Using selector: KqueueSelector Task with 1.9772543942425427 is done. Task with 1.9905760955882492 is done. DEBUG:root:Task-9 alive for too long: 3.004 seconds DEBUG:root:Task-12 alive for too long: 3.004 seconds DEBUG:root:Task-7 alive for too long: 3.005 seconds DEBUG:root:Task-10 alive for too long: 3.005 seconds DEBUG:root:Task-4 alive for too long: 3.005 seconds DEBUG:root:Task-3 alive for too long: 3.005 seconds DEBUG:root:Task-8 alive for too long: 3.005 seconds DEBUG:root:Task-11 alive for too long: 3.005 seconds Task with 3.258507369197187 is done. Task with 3.296558610302638 is done. DEBUG:root:Task-9 alive for too long: 4.008 seconds DEBUG:root:Task-12 alive for too long: 4.008 seconds DEBUG:root:Task-7 alive for too long: 4.008 seconds DEBUG:root:Task-10 alive for too long: 4.008 seconds DEBUG:root:Task-8 alive for too long: 4.008 seconds DEBUG:root:Task-11 alive for too long: 4.008 seconds Task with 4.5930908540696915 is done. DEBUG:root:Task-9 alive for too long: 5.010 seconds DEBUG:root:Task-12 alive for too long: 5.010 seconds DEBUG:root:Task-7 alive for too long: 5.010 seconds DEBUG:root:Task-10 alive for too long: 5.010 seconds DEBUG:root:Task-11 alive for too long: 5.010 seconds Task with 5.443777515592201 is done. DEBUG:root:Task-9 alive for too long: 6.012 seconds DEBUG:root:Task-7 alive for too long: 6.012 seconds DEBUG:root:Task-10 alive for too long: 6.012 seconds DEBUG:root:Task-11 alive for too long: 6.013 seconds Task with 6.3947516177117825 is done. DEBUG:root:Task-7 alive for too long: 7.014 seconds DEBUG:root:Task-10 alive for too long: 7.014 seconds DEBUG:root:Task-11 alive for too long: 7.014 seconds DEBUG:root:Task-7 alive for too long: 8.018 seconds DEBUG:root:Task-10 alive for too long: 8.018 seconds DEBUG:root:Task-11 alive for too long: 8.018 seconds Task with 8.804928052685764 is done. DEBUG:root:Task-7 alive for too long: 9.020 seconds DEBUG:root:Task-11 alive for too long: 9.020 seconds Task with 9.69970472122514 is done. DEBUG:root:Task-11 alive for too long: 10.023 seconds |
Next, let’s look at an example that does not require explicit recording of start times via a helper function.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Monitoring All Active Tasks
We can explore an example of automatically monitoring all active tasks in an asyncio program.
This can be achieved by keeping track of tasks the first time they are seen, then reporting on those tasks if they have been around for too long, e.g. are still active after a given time limit.
We will use the updated monitor() coroutine developed above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# report duration of all current tasks # assumes task names are unique async def monitor(task_limit_sec, exclude=[]): # record of all task names and their start times task_dict = dict() # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # check if not previously known if name not in task_dict: # add start time (first time seen) task_dict[name] = time.monotonic() continue # compute duration for current task duration = time.monotonic() - task_dict[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) |
In this case, we do not need to define a global variable dict for tracking tasks and create times, this state is kept within the monitor() coroutine.
We will use the same work tasks from the previous section that run for a random number of seconds.
Similarly, the monitor() coroutine can be created at the beginning of the program and canceled before shutdown.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# SuperFastPython.com # example of monitoring all active tasks import time import logging import random import asyncio # report duration of all current tasks # assumes task names are unique async def monitor(task_limit_sec, exclude=[]): # record of all task names and their start times task_dict = dict() # loop forever while True: # get all tasks for task in asyncio.all_tasks(): # skip excluded tasks if task in exclude: continue # get task name name = task.get_name() # check if not previously known if name not in task_dict: # add start time (first time seen) task_dict[name] = time.monotonic() continue # compute duration for current task duration = time.monotonic() - task_dict[name] # check if not too long if duration < task_limit_sec: continue # report task that has been alive too long logging.debug(f'{name} alive for too long: {duration:.3f} seconds') # check every second await asyncio.sleep(1) # simple worker coroutine async def do_work(): # generate value in [1, 10] value = 1 + (random.random() * 10) # block await asyncio.sleep(value) # report a message print(f'Task with {value} is done.') # asyncio entry point async def main(): # create the monitor task monitor_coro = monitor(3.0, [asyncio.current_task()]) # start the monitor task monitor_task = asyncio.create_task(monitor_coro) # create a bunch of worker tasks workers = [asyncio.create_task(do_work()) for _ in range(10)] # wait for all tasks _ = await asyncio.gather(*workers) # report final message print('All done') # shutdown the monitor monitor_task.cancel() # enable debug logging logging.basicConfig(level=logging.DEBUG) # start the event loop asyncio.run(main()) |
Running the example first starts the monitor() coroutine, configured to report any tasks that take longer than 3 seconds, and ignore the main() coroutine.
Next, all 3 worker tasks are created and scheduled normally, not using any helper function.
The main() coroutine then suspends waiting for the tasks to complete.
The tasks execute, each sleeping for a random number of seconds.
The monitor() coroutine runs every second. The first time each task is seen, it is recorded in the dict, including its name and timestamp.
The monitor then checks how long each task has been alive. As soon as some tasks have been seen to be alive for “too long”, debug log messages are reported.
This continues until all tasks are completed.
We can see by the end only a handful of tasks have been alive for a long time, e.g. 9 and 10 seconds.
Before exiting, the main() coroutine cancels the monitor task.
This example highlights how we can report the details of any task that has been alive for “too long” in an asyncio program by adding a simple monitoring coroutine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
DEBUG:asyncio:Using selector: KqueueSelector Task with 1.9136662677241314 is done. Task with 2.2405586633208943 is done. DEBUG:root:Task-2 alive for too long: 3.007 seconds DEBUG:root:Task-9 alive for too long: 3.007 seconds DEBUG:root:Task-12 alive for too long: 3.007 seconds DEBUG:root:Task-10 alive for too long: 3.007 seconds DEBUG:root:Task-5 alive for too long: 3.007 seconds DEBUG:root:Task-3 alive for too long: 3.007 seconds DEBUG:root:Task-8 alive for too long: 3.007 seconds DEBUG:root:Task-4 alive for too long: 3.007 seconds DEBUG:root:Task-6 alive for too long: 3.007 seconds DEBUG:root:Task-2 alive for too long: 4.008 seconds DEBUG:root:Task-9 alive for too long: 4.008 seconds DEBUG:root:Task-12 alive for too long: 4.008 seconds DEBUG:root:Task-10 alive for too long: 4.008 seconds DEBUG:root:Task-5 alive for too long: 4.008 seconds DEBUG:root:Task-3 alive for too long: 4.008 seconds DEBUG:root:Task-8 alive for too long: 4.008 seconds DEBUG:root:Task-4 alive for too long: 4.008 seconds DEBUG:root:Task-6 alive for too long: 4.008 seconds Task with 4.749781880137062 is done. DEBUG:root:Task-2 alive for too long: 5.013 seconds DEBUG:root:Task-9 alive for too long: 5.013 seconds DEBUG:root:Task-12 alive for too long: 5.013 seconds DEBUG:root:Task-10 alive for too long: 5.013 seconds DEBUG:root:Task-5 alive for too long: 5.013 seconds DEBUG:root:Task-8 alive for too long: 5.013 seconds DEBUG:root:Task-4 alive for too long: 5.013 seconds DEBUG:root:Task-6 alive for too long: 5.013 seconds Task with 5.5793955396904575 is done. DEBUG:root:Task-2 alive for too long: 6.018 seconds DEBUG:root:Task-9 alive for too long: 6.019 seconds DEBUG:root:Task-12 alive for too long: 6.019 seconds DEBUG:root:Task-10 alive for too long: 6.019 seconds DEBUG:root:Task-8 alive for too long: 6.019 seconds DEBUG:root:Task-4 alive for too long: 6.019 seconds DEBUG:root:Task-6 alive for too long: 6.019 seconds DEBUG:root:Task-2 alive for too long: 7.022 seconds DEBUG:root:Task-9 alive for too long: 7.022 seconds DEBUG:root:Task-12 alive for too long: 7.022 seconds DEBUG:root:Task-10 alive for too long: 7.022 seconds DEBUG:root:Task-8 alive for too long: 7.022 seconds DEBUG:root:Task-4 alive for too long: 7.022 seconds DEBUG:root:Task-6 alive for too long: 7.022 seconds Task with 7.1677185075795125 is done. Task with 7.686247693655644 is done. Task with 7.703480666260263 is done. Task with 7.716391784426704 is done. DEBUG:root:Task-2 alive for too long: 8.024 seconds DEBUG:root:Task-4 alive for too long: 8.025 seconds DEBUG:root:Task-9 alive for too long: 8.025 seconds Task with 8.95468939551258 is done. DEBUG:root:Task-2 alive for too long: 9.028 seconds DEBUG:root:Task-4 alive for too long: 9.028 seconds DEBUG:root:Task-2 alive for too long: 10.030 seconds DEBUG:root:Task-4 alive for too long: 10.030 seconds Task with 10.868886389657645 is done. All done |
Further Reading
This section provides additional resources that you may find helpful.
Python Asyncio Books
- Python Asyncio Mastery, Jason Brownlee (my book!)
- Python Asyncio Jump-Start, Jason Brownlee.
- Python Asyncio Interview Questions, Jason Brownlee.
- Asyncio Module API Cheat Sheet
I also recommend the following books:
- Python Concurrency with asyncio, Matthew Fowler, 2022.
- Using Asyncio in Python, Caleb Hattingh, 2020.
- asyncio Recipes, Mohamed Mustapha Tahrioui, 2019.
Guides
APIs
- asyncio — Asynchronous I/O
- Asyncio Coroutines and Tasks
- Asyncio Streams
- Asyncio Subprocesses
- Asyncio Queues
- Asyncio Synchronization Primitives
References
Takeaways
You now know how to detect and report on stuck long-running asyncio tasks.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Varla Scooter on Unsplash
Do you have any questions?