Last Updated on September 12, 2022
You can use ThreadPoolExecutor for ad hoc IO-bound tasks and AsyncIO for asynchronous programming generally or for vast numbers of IO-bound tasks.
In this tutorial, you will discover the difference between the ThreadPoolExecutor and AsyncIO and when to use each in your Python projects.
Let’s get started.
What Is ThreadPoolExecutor?
The ThreadPoolExecutor class provides a thread pool in Python.
A thread is a thread of execution.
Each thread belongs to a process and can share memory (state and data) with other threads in the same process. In Python, like many modern programming languages, threads are created and managed by the underlying operating system, so-called system-threads or native threads.
You can create a thread pool by instantiating the class and specifying the number of threads via the max_workers argument; for example:
1 2 3 |
... # create a thread pool executor = ThreadPoolExecutor(max_workers=10) |
You can then submit tasks to be executed by the thread pool using the map() and the submit() functions.
The map() function matches the built-in map() function and takes a function name and an iterable of items. The target function will then be called for each item in the iterable as a separate task in the process pool. An iterable of results will be returned if the target function returns a value.
The call to map() does not block, but each result yielded in the returned iterator will block until the associated task is completed.
For example:
1 2 3 4 |
... # call a function on each item in a list and process results for result in executor.map(task, items): # process result... |
You can also issue tasks to the pool via the submit() function that takes the target function name and any arguments and returns a Future object.
The Future object can be used to query the status of the task (e.g. done(), running(), or cancelled()) and can be used to get the result or exception raised by the task once completed. The calls to result() and exception() will block until the task associated with the Future is done.
For example:
1 2 3 4 5 |
... # submit a task to the pool and get a future immediately future = executor.submit(task, item) # get the result once the task is done result = future.result() |
Once you are finished with the thread pool, it can be shut down by calling the shutdown() function in order to release all of the worker threads and their resources.
For example:
1 2 3 |
... # shutdown the thread pool executor.shutdown() |
The process of creating and shutting down the thread pool can be simplified by using the context manager that will automatically call the shutdown() function.
For example:
1 2 3 4 5 6 7 8 |
... # create a thread pool with ThreadPoolExecutor(max_workesr=10) as executor: # call a function on each item in a list and process results for result in executor.map(task, items): # process result... # ... # shutdown is called automatically |
Now that we are familiar with ThreadPoolExecutor, let’s take a look at AsyncIO.
Run loops using all CPUs, download your FREE book to learn how.
What Is AsyncIO?
Python has an asyncio module for Asynchronous Input/Output (AsyncIO).
It primarily provides a way to create and run coroutines using the async/await syntax.
A coroutine is a programming pattern that generalizes routines (e.g. subroutines, functions, or blocks of code) to allow them to be suspended and resumed. Coroutines use cooperative multitasking, requiring threads of execution to explicitly yield control.
A coroutine asynchronous task can be defined by adding the “async” keyword prior to a function definition; for example:
1 2 3 |
# define a coroutine async def work(): # do things... |
This syntax defines an awaitable, which is a unit of execution that can be awaited, e.g. waited for.
A program can wait on an awaitable asynchronous task to complete using the “await” keyword; for example:
1 2 3 |
... # create and schedule the coroutine and wait for it to return, await work() |
This will do two things:
- Create a coroutine and schedule it for execution.
- Yield execution until the coroutine returns.
We can also create and schedule a coroutine for execution by calling the create_task() function, which will return a Task object.
1 2 3 |
... # create and schedule a coroutine task = create_task(work()) |
This Task object provides a handle on the scheduled coroutines, allowing the status of the task to be queried, a done callback to be added and for the task to be cancelled.
We can also wait on a task object directly; for example:
1 2 3 |
... # wait for the task to complete await task |
The async/await syntax is only supported under the asyncio runtime that creates an event loop, allowing the thread of execution to execute tasks asynchronously and block.
This is achieved using the asyncio.run() function for the entry point asynchronous function; for example:
1 2 3 |
... # start the asyncio runtime asyncio.run(work()) |
This will create the event loop runtime required to support the scheduling and execution of coroutines.
This provides a brief tour of the creation and execution of coroutines in Python, although the asyncio module is larger and provides a broader suite of tools for creating and managing coroutines with a focus on IO-tasks.
Comparison of ThreadPoolExecutor vs. AsyncIO
Now that we are familiar with the ThreadPoolExecutor and AsyncIO, let’s review their similarities and differences.
Similarities Between ThreadPoolExecutor and AsyncIO
The ThreadPoolExecutor and AsyncIO classes are very similar. Let’s review some of the most important similarities.
1. Both Execute Tasks Concurrently
Both the ThreadPoolExecutor and AsyncIO can be used to execute tasks concurrently.
Tasks can be submitted to the ThreadPoolExecutor using submit() or map() and they will be executed by a worker thread.
Tasks in AsyncIO can be scheduled using the await keyword and will be executed by the event loop as soon as possible.
Technically, we might refer to ThreadPoolExecutor as a tool for concurrent programming and async/await as a tool for asynchronous programming. The difference is subtle. The emphasis of concurrent programming is on independent tasks, whereas the emphasis of asynchronous programming is on unknown order of execution.
As such, it would be just as valid to say both ThreadPoolExecutor and AsyncIO support asynchronous programming.
2. Both Are Concurrent but Not Parallel
Both the ThreadPoolExecutor and AsyncIO execute concurrent tasks but not parallel tasks.
Parallel execution means executing two or more instructions at the same time, such as on two physical CPU cores.
Neither the ThreadPoolExecutor nor AsyncIO are able to execute code in parallel, although it may look like this is the case.
Tasks executed by the ThreadPoolExecutor are executed using threads. Python threads are subject to the Global Interpreter Lock (GIL), which means that only a single thread can execute within a Python process at one time.
Similarly, AsyncIO executes coroutines within a single thread and a single Python thread will execute on a single CPU core.
3. Both are Suited to IO-Bound Tasks
Both the ThreadPoolExecutor and AsyncIO are best suited to IO-bound tasks.
These are tasks that read or write from a resource like a file or network connection and are limited by the speed that data can be moved in or out of the resource. This is opposed to CPU-bound tasks that are limited by the speed of the CPU.
The ThreadPoolExecutor uses worker threads internally, which are suited to IO-bound tasks instead of CPU-bound tasks primarily because the Global Interpreter Lock (GIL) prevents the parallel execution of threads within a process.
Python supports asynchronous programming generally with the async/await keywords, but the AsyncIO module is specifically focused on IO tasks, such as reading and writing from streams like files and network sockets.
Differences Between ThreadPoolExecutor and AsyncIO
The ThreadPoolExecutor and AsyncIO are also quite different. Let’s review some of the most important differences.
1. Threads vs. Coroutines
Perhaps the most important difference is the way in which concurrency is implemented.
The ThreadPoolExecutor uses worker threads. These are real system level threads that are allocated and managed by the underlying operating system.
The AsyncIO framework uses coroutines. These are a software or Python-level programming pattern that execute within a single operating-system level thread.
As such, Python threads and the ThreadPoolExecutor subsume AsyncIO. For example, each Python thread could execute an event loop of coroutines. Similarly, a process subsumes the Threads, where each Python process could maintain multiple threads, each of which may execute an event loop of coroutines.
The operating system may be limited in the total number of system threads that can be created by running processes, or even the number of threads within a single process. This may be because each thread requires the allocation of memory for the stack space and must be managed by the operating system.
This is unlike coroutines that are managed by Python as a software pattern that is not subject to the same constraints as system threads.
As such, the number of worker threads in the ThreadPoolExecutor might be capped at an upper limit of one thousand or a few thousand, whereas the number of coroutines may not have a reasonable limit. A single event loop may support thousands, tens of thousands, or even hundreds of thousands of coroutines.
This difference defines all other differences between the two approaches.
2. Preemptive vs. Cooperative Multitasking
Multitasking refers simply to executing multiple tasks at the same time, although typically refers to the way in which operating systems permit multiple programs to run concurrently on one (or perhaps a few) CPU cores.
The ThreadPoolExecutor achieves concurrency via multitasking, whereas AsyncIO achieves concurrency via cooperative multitasking.
The ThreadPoolExecutor uses worker threads, which are system-level constructs. The operating system will choose what threads will run at any given time on a CPU core.
It will actively suspend a running thread and preserve its state so it can be resumed later and reinstate another thread and permit it to execute. This is called a context switch, e.g. the context of execution is changed by the operating system.
The operating system will context switch continually between running threads and you will rarely notice; this approach to managing threads is called preemptive multitasking. It will attempt to choose a good time to perform a context switch between threads, such as when a thread is blocking, like performing an IO task.
The async/await keywords in AsyncIO give fine grained control over exactly when each task will give up control of execution and allow another task to execute.
A coroutine must await the result of another coroutine, which explicitly yields execution and control of the thread to another task.
This explicit yielding of control to other tasks is called cooperating multitasking.
3. Ad Hoc Tasks vs. Asynchronous Programming
The ThreadPoolExecutor is designed to execute ad hoc tasks, whereas async/await is a general framework for asynchronous programming.
The ThreadPoolExecutor can be used in a program to execute arbitrary functions in another thread. It is a utility for use within a normal imperative or object-oriented Python program. For example, a call to the map() built-in function or a for-loop can be made concurrent relatively easily using the ThreadPoolExecutor.
The async/await keywords and require the program to be designed around the paradigm of asynchronous programming. It requires that functions be carefully chosen to be awaitable and that IO operations use non-blocking functions called provided by the asyncio module.
This can make code harder to read and maintain for developers not versed and experienced with the paradigm. Perhaps this is a comparable support burden for code that is multithreaded generally, but less comparable with code that uses a thread pool.
Summary of Differences
It may help to summarize the differences between ThreadPoolExecutor and AsyncIO.
ThreadPoolExecutor
- Uses Threads, not Coroutines.
- System-level, not software-level.
- Preemptive multitasking, not Cooperative.
- Tasks are any function, not constrained.
- Requires Thread Safety.
- Create 100s of Threads, not 1000s.
AsyncIO
- Uses Coroutines, not Threads.
- Software-level, not System-level.
- Cooperative multitasking, not Preemptive.
- Awaitable Tasks, not any function.
- No Thread Safety Concerns.
- Create 100,000+ Coroutines, not reasonably limited.
Free Python ThreadPoolExecutor Course
Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.
Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.
How to Choose ThreadPoolExecutor or AsyncIO
When should you use ThreadPoolExecutor and when should you use AsyncIO? Let’s review some useful suggestions.
When to Use ThreadPoolExecutor
In this section, we will look at some general cases where ThreadPoolExecutor is a good fit.
Use ThreadPoolExecutor When:
- Your tasks can be defined by a pure function that has no state or side effects.
- Your task can fit within a single Python function, likely making it simple and easy to understand.
- You need to perform the same task many times, e.g. homogeneous tasks.
- You need to apply the same function to each object in a collection in a for-loop.
The sweet spot for ThreadPoolExecutor is in transforming a Python program that has relatively independent IO-bound tasks to be concurrent.
Programs can be built around use of thread pools, but it may be more likely that a program is conceived, implemented and tested with sequential code, then made concurrent using the ThreadPoolExecutor to improve the performance of the program.
When to Use AsyncIO
In this section, we will look at some general cases where AsyncIO is a good fit.
Use AsyncIO When:
- You explicitly want or need to adopt the asynchronous programming paradigm.
- You need to execute 1000s+ of concurrent IO-bound tasks.
AsyncIO is proposed as an alternative to Python threads, but this is misleading.
It requires that you embrace the asynchronous programming paradigm from the beginning and cannot be bolted on afterward like the ThreadPoolExecutor.
In this way, it is perhaps comparable to extending the Thread class in that it requires consideration up front during the design of your program.
The sweet spot for AsyncIO are programs that perform many IO-bound tasks, like the ThreadPoolExecutor. Unlike the ThreadPoolExecutor, it requires that the IO-bound task also has a non-blocking API available, such in the asyncio module or in a third party library.
The benefit of fully embracing the asynchronous programming and the asyncio module is that it can result in more capable code.
For example, the use of coroutines means you can dispatch tens or hundreds of thousands of IO-bound tasks effortlessly without having to wait on the operating system to instantiate a system thread for each task, as might be the case when using a thread pool.
This makes asynchronous programming in general and the asyncio module specifically well suited for programs that require a vast number of IO-bound tasks, such as:
- A client program that has connections to many servers.
- A server program that supports connections from many clients.
- A peer-to-peer program that supports connections to and from many peers.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Books
- ThreadPoolExecutor Jump-Start, Jason Brownlee, (my book!)
- Concurrent Futures API Interview Questions
- ThreadPoolExecutor Class API Cheat Sheet
I also recommend specific chapters from the following books:
- Effective Python, Brett Slatkin, 2019.
- See Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPoolExecutor: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know the difference between ThreadPoolExecutor and AsyncIO and when to use each.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Leslie Cross on Unsplash
Lorence says
Hi, thank you for nice article. I have practical example, let say I want to scrape review of place from google and write it to csv files. I’ve tried using async function, it speed up the process because the script can load review one by one from google and write to csv asynchronously. Do you think if I use threadpoolexecutor will it be faster? You can look at my code at https://github.com/ChristCoding/scraping_google_review. Thanks.
Jason Brownlee says
Perhaps you can try an executor and compare performance to asyncio.
Sorry, I don’t have the capacity to review code.
pythonwood says
a good post even when i know async io much more today.
Jason Brownlee says
Thanks!
William says
Something worth considering. ThreadPoolExecutor has a max_workers “feature”. so for instance hitting api endpoints or opening db connections can be controlled as to how many are running at the same time.
Im still trying to find a way to do the same with asyncio. at the moment it pretty much floods and then waits. (ie 100 fishermen throw their lines in at the same time vs 5 at a time; if the dam is rather small it might cause issues)
thanks for all the hard work with these tutorials! its muchly appreciated!
Jason Brownlee says
One approach might be to execute tasks concurrently although require each coroutine/task to acquire a semaphore. The semaphore can then limit the total number of concurrent tasks.
This tutorial may help:
https://superfastpython.com/asyncio-semaphore/
William says
o right.. i should of commented ages ago! 2min after posting i find a solution to the flooding. (wanted to reply to my post above but it didnt want me to talk to myself…)
https://stackoverflow.com/questions/48483348/how-to-limit-concurrency-with-python-asyncio
async def gather_with_concurrency(n, *coros):
semaphore = asyncio.Semaphore(n)
async def sem_coro(coro):
async with semaphore:
return await coro
return await asyncio.gather(*(sem_coro(c) for c in coros))
await gather_with_concurrency(100, *my_coroutines)
Jason Brownlee says
Nice!