You should use the ThreadPool class in your smaller program. For newer programs expected to live for years, consider the ThreadPoolExecutor as an alternative.
In this tutorial you will discover whether you should use the ThreadPool class in your Python project or not.
Let’s get started.
Table of Contents
What is the ThreadPool Class?
The multiprocessing.pool.ThreadPool class provides a pool of worker threads.
It overrides the multiprocessing.pool.Pool class to provide an identical API as the Pool class, but using thread-based concurrency instead of process-based concurrency. This is why the ThreadPool is in the multiprocessing module.
In particular, the Pool function provided by multiprocessing.dummy returns an instance of ThreadPool, which is a subclass of Pool that supports all the same method calls but uses a pool of worker threads rather than worker processes.— multiprocessing — Process-based parallelism
The ThreadPool can be used to execute one-off and multiple tasks using worker threads.
Methods like apply() and map() will execute functions in the thread pool synchronously, blocking and returning once the task or tasks are complete.
Methods like apply_async() and map_async() execute functions in the thread pool asynchronously. They return immediately with an AsyncResult object that provides a handle on the issued tasks, and allows the program to continue on and query the status or get results from the tasks at a later time.
You can learn more about the ThreadPool class in the tutorial:
Now that we know what the ThreadPool class is, should we use it in our Python programs?
Run your loops using all CPUs, download my FREE book to learn how.
Should We Use the ThreadPool Class?
Given that the Python standard library provides the ThreadPool class, should we use it in our projects?
We can look at this question from two perspectives:
- The suggestions in the API documentation.
- A broader view of the class in the context of the API.
Let’s take a closer look at each in turn.
Suggestions in API Documentation
The documentation in the API, as of Python v3.10 suggests that users should not use the ThreadPool class.
It makes three arguments
- The class is an extension of Pool and was not developed to use threads from the ground up.
- It is a wrapper for the Pool class and because of this some features and implementation details do not make sense.
- It returns AsyncResult objects for async tasks which are not amicable with asyncio or third-party libraries.
Generally, the API documentation suggests that developers should consider the concurrent.futures.ThreadPoolExecutor class instead.
Users should generally prefer to use concurrent.futures.ThreadPoolExecutor, which has a simpler interface that was designed around threads from the start, and which returns concurrent.futures.Future instances that are compatible with many other libraries, including asyncio.— multiprocessing — Process-based parallelism
The rationale is that the ThreadPool returns AsyncResult objects that are generally not comfortable with asyncio or third-party libraries. Whereas the concurrent.futures.ThreadPoolExecutor returns Future objects, which are compatible with asyncio.
It also hints that a class developed around threads from the ground up, instead of ported to threads, would be better in a general sense.
It also comments that the ThreadPool extends the Pool class and provides the same API. As such, some features of the class or implementations may not make sense.
A ThreadPool shares the same interface as Pool, which is designed around a pool of processes and predates the introduction of the concurrent.futures module. As such, it inherits some operations that don’t make sense for a pool backed by threads, and it has its own type for representing the status of asynchronous jobs, AsyncResult, that is not understood by any other libraries.— multiprocessing — Process-based parallelism
Broader View of the API
Let’s consider the broader view of the Python concurrency API.
- The ThreadPool class is part of the Python standard library, and therefore can and should be used.
- There are no current plans to remove or deprecate the class.
- The ThreadPool class is identical to the Pool class API, providing a drop-in replacement.
For these reasons alone, the ThreadPool can and should be used as part of your Python programs, when appropriate.
There are some good reasons against using the class.
- It is not well documented and is only mentioned briefly in the multiprocessing module API docs.
- Given it’s not widely known, it is relatively unused.
- Its lack of use may suggest it may not be as exercised as the rest of the API and may harbor complex bugs.
For example, if we look at the unit tests for the multiprocessing module in test/_test_multiprocessing.py, it does not appear that the ThreadPool has any unit tests, at least at the time of writing with Python 3.10.
That being said, the Pool class is unit tested and the ThreadPool merely extends this class.
It comes down to a judgment call whether you should use ThreadPool or an alternative in your project.
For production code expected to live for years, I’d recommend ThreadPoolExecutor, otherwise, smaller programs and scripts, the ThreadPool class is fine.
Next, let’s take a closer look at the ThreadPoolExecutor as an alternative to the ThreadPool class.
Confused by the ThreadPool class API?
Download my FREE PDF cheat sheet
Alternative to ThreadPool Class
A modern alternative to the ThreadPool class is the ThreadPoolExeuctor.
This alternative is directly suggested in the API documentation for the ThreadPool class.
Users should generally prefer to use concurrent.futures.ThreadPoolExecutor, which has a simpler interface that was designed around threads from the start …— multiprocessing — Process-based parallelism
Like the ThreadPool.apply_async() method, one-off tasks can be issued to the ThreadPoolExecutor using the submit() method.
# create the thread pool
with ThreadPoolExecutor() as pool:
# issue a one off async task
future = pool.submit(task)
Similarly, like the ThreadPool.map() method, multiple tasks can be executed with the ThreadPoolExecutor via the map() method.
# create the thread pool
with ThreadPoolExecutor() as pool:
# issue many tasks
for result in pool.map(task, items):
Unlike the map() method in the ThreadPool class, the ThreadPoolExeuctor.map() method allows multiple arguments to be provided like the built-in map() function, avoiding the need for the ThreadPool.starmap() function.
Unlike the ThreadPool that returns an AsyncResult object for asynchronous tasks, the ThreadPoolExecutor returns a Future object as a handle on each task issued to the pool.
You can learn more about how to use the ThreadPoolExecutor class in the tutorial:
Free Python ThreadPool Course
Download my ThreadPool API cheat sheet and as a bonus you will get FREE access to my 7-day email course.
Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously
This section provides additional resources that you may find helpful.
- Python ThreadPool Jump-Start, Jason Brownlee, 2022 (my book!).
- Threading API Interview Questions
- ThreadPool Class API Cheat Sheet
I also recommend specific chapters from the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Overwheled by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
You now know whether you should use the ThreadPool class in your project.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Amber Turner on Unsplash
Leave a Reply