Last Updated on October 29, 2022
Python provides two pools of thread-based workers via the multiprocessing.pool.ThreadPool class and the concurrent.futures.ThreadPoolExecutor class.
In this tutorial, you will discover the similarities and differences between the ThreadPool and ThreadPoolExecutor. This will help you decide which to use in your Python projects for thread-based concurrency.
Let’s get started.
What is ThreadPool
The multiprocessing.pool.ThreadPool class in Python provides a pool of reusable threads for executing ad hoc tasks.
A thread pool object which controls a pool of worker threads to which jobs can be submitted.
— multiprocessing — Process-based parallelism
The ThreadPool class extends the Pool class. The Pool class provides a pool of worker processes for process-based concurrency.
Although the ThreadPool class is in the multiprocessing module it offers thread-based concurrency.
Recall that a thread is a thread of execution. Each thread belongs to a process and can share memory (state and data) with other threads in the same process. In Python, like many modern programming languages, threads are created and managed by the underlying operating system, so-called system threads or native threads.
We can create a thread pool by instantiating the ThreadPool class and specifying the number of threads via the “processes” argument; for example:
1 2 3 |
... # create a thread pool pool = ThreadPool(processes=10) |
We can issue one-off tasks to the ThreadPool using methods such as apply() or we can apply the same function to an iterable of items using methods such as map().
The map() function matches the built-in map() function and takes a function name and an iterable of items. The target function will then be called for each item in the iterable as a separate task in the thread pool. An iterable of results will be returned if the target function returns a value.
For example:
1 2 3 4 |
... # call a function on each item in a list and handle results for result in pool.map(task, items): # handle the result... |
The ThreadPool class offers many variations on the map() method for issuing tasks, you can learn more in the tutorial:
We can issue tasks asynchronously to the ThreadPool, which returns an instance of an AsyncResult immediately. One-off tasks can be used via apply_async(), whereas the map_async() offers an asynchronous version of the map() method.
The AsyncResult object provides a handle on the asynchronous task that we can use to query the status of the task, wait for the task to complete, or get the return value from the task, once it is available.
For example:
1 2 3 4 5 |
... # issue a task to the pool and get an asyncresult immediately result = pool.apply_async(task) # get the result once the task is done value = result.get() |
Once we are finished with the ThreadPool, it can be shut down by calling the close() method in order to release all of the worker threads and their resources.
For example:
1 2 3 |
... # shutdown the thread pool pool.close() |
The life-cycle of creating and shutting down the thread pool can be simplified by using the context manager that will automatically close the ThreadPool.
For example:
1 2 3 4 5 6 7 8 |
... # create a thread pool with ThreadPool(10) as pool: # call a function on each item in a list and handle results for result in pool.map(task, items): # handle the result... # ... # shutdown automatically |
You can learn more about how to use the ThreadPool class in the tutorial:
Now that we are familiar with the ThreadPool class, let’s take a look at ThreadPoolExecutor.
Run loops using all CPUs, download your FREE book to learn how.
What is ThreadPoolExecutor
The ThreadPoolExecutor class provides a thread pool in Python.
We can create a thread pool by instantiating the class and specifying the number of threads via the max_workers argument; for example:
1 2 3 |
... # create a thread pool executor = ThreadPoolExecutor(max_workers=10) |
We can then submit tasks to be executed by the thread pool using the map() and the submit() functions.
The map() function matches the built-in map() function and takes a function name and an iterable of items. The target function will then be called for each item in the iterable as a separate task in the thread pool. An iterable of results will be returned if the target function returns a value.
The call to map() does not block, but each result yielded in the returned iterator will block until the associated task is completed.
For example:
1 2 3 4 |
... # call a function on each item in a list and handle results for result in executor.map(task, items): # handle the result... |
We can also issue tasks to the pool via the submit() function that takes the target function name and any arguments and returns a Future object.
The Future object can be used to query the status of the task (e.g. done(), running(), or cancelled()) and can be used to get the result or exception raised by the task once completed. The calls to result() and exception() will block until the task associated with the Future is done.
For example:
1 2 3 4 5 |
... # submit a task to the pool and get a future immediately future = executor.submit(task, item) # get the result once the task is done result = future.result() |
Once we are finished with the thread pool, it can be shut down by calling the shutdown() function in order to release all of the worker threads and their resources.
For example:
1 2 3 |
... # shutdown the thread pool executor.shutdown() |
The life-cycle of creating and shutting down the thread pool can be simplified by using the context manager that will automatically call the shutdown() function.
For example:
1 2 3 4 5 6 7 8 |
... # create a thread pool with ThreadPoolExecutor(max_workers=10) as executor: # call a function on each item in a list and handle results for result in executor.map(task, items): # handle the result... # ... # shutdown is called automatically |
You can learn more about the ThreadPoolExecutor in the tutorial:
Now that we are familiar with the ThreadPool and ThreadPoolExecutor, let’s compare and contrast each.
Comparison of ThreadPool vs ThreadPoolExecutor
Now that we are familiar with the ThreadPool and ThreadPoolExecutor classes, let’s review their similarities and differences.
Similarities Between ThreadPool and ThreadPoolExecutor
The ThreadPool and ThreadPoolExecutor classes are very similar. They are both thread pools that provide a collection of workers for executing ad hoc tasks.
The most important similarities are as follows:
- Both Use Threads
- Both Can Run Ad Hoc Tasks
- Both Support Asynchronous Tasks
- Both Can Wait For All Tasks
- Both Have Process-Based Equivalents
Let’s take a closer look at each in turn.
1. Both Use Threads
Both the ThreadPool and ThreadPoolExecutor create and use worker threads.
These are real native or system-level threads. This means they are created and managed by the underlying operating system.
As such, the workers used in each class use thread-based concurrency.
This means tasks issued to each thread pool will execute concurrently and are well suited to IO-bound tasks, not CPU-bound tasks because of the Global Interpreter Lock.
It also means, that tasks issued to each thread pool can share data directly with other threads in the process because of the shared memory model supported by threads.
2. Both Can Run Ad Hoc Tasks
Both the ThreadPool and ThreadPoolExecutor may be used to execute ad hoc tasks defined by custom functions.
The ThreadPool can issue one-off tasks using the apply() and apply_async() function, and may issue multiple tasks that use the same function with different arguments with the map(), imap(), imap_unordered(), and starmap() functions and their asynchronous equivalents map_async() and starmap_async().
The ThreadPoolExecutor can issue one-off tasks via the submit() function, and may issue multiple tasks that use the same function with different arguments via the map() function.
3. Both Support Asynchronous Tasks
Both the ThreadPool and ThreadPoolExecutor can be used to issue tasks asynchronously.
Recall that issuing tasks asynchronously means that the main thread can issue a task without blocking. The function call will return immediately with some handle on the issued task and allow the main thread to continue on with the program.
The ThreadPool supports issuing tasks asynchronously via the apply_async(), map_async(), and starmap_async() functions that return an AsyncResult object that provides a handle on the issued tasks.
The ThreadPoolExecutor provides the submit() function for issuing tasks asynchronously that returns a Future object that provides a handle on the issued task.
Additionally, both thread pools provide helpful mechanisms for working with asynchronous tasks, such as checking their status, getting their results, and adding callback functions.
4. Both Can Wait For All Tasks
Both the ThreadPool and ThreadPoolExecutor provide the ability to wait for tasks that were issued asynchronously.
The ThreadPool provides a wait() function on the AsyncResult object returned as a handle on asynchronous tasks. It also allows the pool to be shutdown and joined, which will not return until all issued tasks have been completed.
The ThreadPoolExecutor provides the wait() module function that can take a collection of Future objects on which to wait. It also allows the thread pool to be shut down, which can be configured to block until all tasks in the pool have been completed.
5. Both Have Process-Based Equivalents
Both the ThreadPool and ThreadPoolExecutor thread pools have process-based equivalents.
The ThreadPool has the multiprocessing.pool.Pool that provides the same API, except that it uses process-based concurrency instead of thread-based concurrency.
Similarly, the ThreadPoolExecutor has the concurrent.futures.ProcessPoolExecutor that provides the same API as the ThreadPoolExecutor (e.g. extends the same Executor base class) except that it is implemented using process-based concurrency.
This is helpful as both thread pools can be used and switch to use process-based concurrency with very little change to the program code.
Differences Between ThreadPool and ThreadPoolExecutor
The ThreadPool and ThreadPoolExecutor are also subtly different.
The differences between these two thread pools are focused on differences in APIs on the classes themselves.
The main differences are as follows:
- Ability to Cancel Tasks
- Operations on Groups of Tasks
- Asynchronous Map Functions
- Ability to Access Exception
Let’s take a closer look at each in turn.
1. Ability to Cancel Tasks
Tasks issued to the ThreadPoolExecutor can be canceled, whereas tasks issued to the Thread cannot.
The ThreadPoolExecutor provides the ability to cancel tasks that have been issued to the thread pool but have not yet started executing.
This is provided via the cancel() function on the Future object returned from issuing a task via submit().
The ThreadPool does not provide this capability.
2. Operations on Groups of Tasks
The ThreadPoolExecutor provides tools to work with groups of asynchronous tasks, whereas the ThreadPool does not.
The concurrent.futures module provides the wait() and as_completed() module functions. These functions are designed to work with collections of Future objects returned when issuing tasks asynchronously to the thread pool via the submit() function.
They allow the caller to wait for an event on a collection of heterogeneous tasks in the thread pool, such as for all tasks to complete, for the first task to complete, or for the first task to fail.
They also allow the caller to handle the results from a collection of heterogeneous tasks in the order that the tasks are completed, rather than the order the tasks were issued.
The ThreadPool does not provide this capability.
3. Asynchronous Map Functions
The ThreadPool provides a focus on map() based concurrency, whereas the ThreadPoolExecutor does not.
The ThreadPoolExecutor does provide a parallel version of the built-in map() function which will apply the same function to an iterable of arguments. Each function call is issued as a separate task to the thread pool.
The ThreadPool provides three versions of the built-in map() function for applying the same function to an iterable of arguments in parallel as tasks in the thread pool.
They are: the map(), a lazier version of map() called imap(), and a version of map() that takes multiple arguments for each function call called starmap().
It also provides a version imap() where the iterable of results has return values in the order that tasks are complete rather than the order that tasks are issued called imap_unordered().
Finally, it has asynchronous versions of the map() function called map_async() and of the starmap() function called starmap_async().
In all, the ThreadPool provides 6 parallel versions of the built-in map() function.
4. Ability to Access Exception
The ThreadPoolExecutor provides a way to access an exception raised in an asynchronous task directly, whereas the ThreadPool does not.
Both thread pools provide the ability to check if a task was successful or not, and will re-raise an exception when getting the task result if an exception was raised and not handled in the task.
Nevertheless, only the ThreadPoolExecutor provides the ability to directly get an exception raised in a task.
A task issued into the ThreadPoolExecutor asynchronously via the submit() function will return a Future object. The exception() function on the Future object allows the caller to check if an exception was raised in the task and if so, to access it directly.
The ThreadPool does not provide this ability.
Summary of Differences
It may help to summarize the differences between ThreadPool and ThreadPoolExecutor.
ThreadPool
- Does not provide the ability to cancel tasks, whereas the ThreadPoolExecutor does.
- Does not provide the ability to work with collections of heterogeneous tasks, whereas the ThreadPoolExecutor does.
- Provides the ability to forcefully terminate all tasks, whereas the ThreadPoolExecutor does not.
- Provides a focus on parallel versions of the map() function, whereas the ThreadPoolExecutor does not.
- Does not provide the ability to access an exception raised in a task, whereas the ThreadPoolExecutor does.
ThreadPoolExecutor
- Provides the ability to cancel tasks, whereas the ThreadPool does not.
- Provides the ability to work with collections of heterogeneous tasks, whereas the ThreadPool does not.
- Does not provide the ability to forcefully terminate all tasks, whereas the ThreadPool does.
- Does not provide multiple parallel versions of the map() function, whereas the ThreadPool does.
- Provides the ability to access an exception raised in a task, whereas the ThreadPool does not.
The figure below provides a helpful side-by-side comparison of the key differences between ThreadPool and ThreadPoolExecutor.
Free Python ThreadPool Course
Download your FREE ThreadPool PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPool API.
Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python ThreadPool Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- ThreadPool PDF Cheat Sheet
I also recommend specific chapters from the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPool: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
APIs
References
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Takeaways
You now know the difference between ThreadPool and ThreadPoolExecutor and when to use each.
Do you have any questions ?
Ask your questions in the comments below and I will do my best to answer.
Photo by Rainer Bleek on Unsplash
Do you have any questions?