A common justification for using the ThreadPool class over the ThreadPoolExecutor class in Python is that it is faster.
We can benchmark the performance of the ThreadPoolExecutor versus the ThreadPool on common use cases of issuing one-off asynchronous tasks and batches of tasks. Comparing the benchmark results, we find very little performance difference between the two tasks, and perhaps a slight improvement in performance when using the ThreadPoolExecutor.
The results suggest little justification for choosing the ThreadPool over the ThreadPoolExecutor for performance reasons in common use cases.
In this tutorial, you will discover how to compare the performance of the ThreadPoolExecutor and ThreadPool class in Python.
Let’s get started.
Is ThreadPoolExecutor Slower Than ThreadPool?
Python provides two thread pool classes for executing and managing general tasks in a consistent way, they are the ThreadPool and the ThreadPoolExecutor.
The ThreadPool is a thread-based port of the multiprocessing.pool.Pool class that provides a pool of processes. Tasks can be issued with blocking and non-blocking asynchronous methods, either on-off e.g. apply_async() or in batches, e.g. map().
For example:
1 2 3 4 5 6 7 |
... # create a thread pool with ThreadPool(10) tp: # issue one task asynchronously result = tp.apply_async(task) # wait for the task to complete result.wait() |
You can learn more about how to use the ThreadPool in the tutorial:
The concurrent.futures.ThreadPoolExecutor is a modern thread pool class, added to the Python standard library much later than the Pool and ThreadPool classes. It is intended to provide a modern thread pool using the executor interface with futures to represent asynchronously executed tasks.
Tasks can be issued either one-off via the submit() method or in groups via the map() method.
For example:
1 2 3 4 5 6 |
... # create the thread pool with ThreadPoolExecutor(10) as tp: # issue one-off tasks future = tp.submit(task) # wait for all tasks to complete |
You can learn more about how to use the ThreadPoolExecutor in the tutorial:
It is generally thought that the ThreadPool class is faster than the ThreadPoolExecutor. Specifically that the ThreadPool is small and efficient and that the ThreadPoolExecutor has more checks and balances and internal classes, making it slower.
This opinion is often used to defend the adoption of the ThreadPool class over the ThreadPoolExecutor class, often with no benchmark numbers to support it.
How can we check if the newer ThreadPoolExecutor is slower or as fast as the older ThreadPool class?
Run loops using all CPUs, download your FREE book to learn how.
How to Compare ThreadPoolExecutor and ThreadPool
We can compare the general performance of the ThreadPoolExecutor class to the ThreadPool using a consistent test harness.
The performance of the thread pools can be compared directly in two use cases:
- Benchmark performance of issuing and waiting on one-off asynchronous tasks.
- Benchmark performance of issuing tasks in batch.
Benchmark One-Off Async Tasks
Both thread pools support the ability to issue one-off asynchronous tasks. The ThreadPool provides the apply_async() method that takes a function name and any arguments and returns a ResultAsync object. The ThreadPoolExecutor provides the submit() method that takes a function name and any arguments and returns a Future object.
We can define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of one-off async tasks (e.g. 1,000) and waits for them to complete.
For example:
1 2 3 4 5 6 7 |
# issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPoolExecutor(100) as tp: # issue one-off tasks _ = [tp.submit(task, i) for i in range(1000)] # wait for tasks to complete |
The task executed in both cases will be the same, e.g. a function that takes one argument and sleeps for a moment to simulate effort. The sleep ensures the experiment takes a non-trivial time to execute and that some tasks are blocked from immediate execution, allowing them to be managed internally within the thread pool.
For example:
1 2 3 |
# task executed in the thread pool def task(value): sleep(1) |
Benchmark Batched Tasks
Both thread pools support the ability to issue batches of tasks. The ThreadPool and ThreadPoolExecutor classes both provide a map() method that takes a function name and an iterable of arguments to provide to each task and returns an iterable of task results.
We can again define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of batched tasks (e.g. 1,000) and iterates the returned results.
For example:
1 2 3 4 5 6 7 |
# issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPoolExecutor(100) as tp: # issue batch of tasks for result in tp.map(task, range(1000), chunksize=10): pass |
The task function must take an argument and return a value. As above, the task will sleep for a moment to simulate effort, to ensure that some tasks are forced to wait before execution.
For example:
1 2 3 4 |
# task executed in the thread pool def task(value): sleep(1) return value |
In both cases, the map() method takes a “chunksize” argument. This argument controls the number of tasks that are grouped together and issued to each worker in batch. This argument can be set consistently across each thread pool (e.g. an even division of tasks across workers).
You can learn more about how to configure the chunksize argument in the tutorial:
Note, the map() method in the ThreadPoolExecutor does take a “chunksize” argument, but this has no effect.
Benchmarking the Use Cases
Each of the above use cases can be defined as a function that performs the activity in a given time, e.g. use_threadpool().
We can then call the function multiple times, timing the duration of each function call and then reporting the average over all trials.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# benchmark the thread pool n_trials = 3 results = list() for trial in range(n_trials): # record start time start_time = time() # perform task use_threadpool() # calculate duration duration = time() - start_time results.append(duration) print(f'Trial {trial} took {duration} seconds') # report average result average = sum(results) / n_trials print(f'Benchmark: {average:.3f} seconds') |
This general benchmark code can be the same across all classes and use cases.
This provides a general framework that you can adapt for your own use cases, to see if the choice of thread pool makes a performance difference.
Now that we have some ideas on how to benchmark the two classes, let’s explore some worked examples.
ThreadPoolExecutor vs ThreadPool One-Off Async Tasks
In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing one-off asynchronous tasks.
ThreadPool One-Off Async Tasks
The ThreadPool can issue one-off asynchronous tasks via the apply_async() method which returns a ResultAsync object.
You can learn more about the apply_async() method in the tutorial:
Once issued, we can wait for all tasks to complete by closing and joining the thread pool.
You can learn more about joining the thread pool in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# SuperFastPython.com # benchmark threadpool with one-off async tasks from multiprocessing.pool import ThreadPool from time import sleep from time import time # task executed in the thread pool def task(value): sleep(1) # issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPool(100) as tp: # issue one-off tasks _ = [tp.apply_async(task, args=(i,)) for i in range(1000)] # close the pool tp.close() # wait for tasks to complete tp.join() # benchmark the thread pool n_trials = 3 results = list() for trial in range(n_trials): # record start time start_time = time() # perform task use_threadpool() # calculate duration duration = time() - start_time results.append(duration) print(f'Trial {trial} took {duration} seconds') # report average result average = sum(results) / n_trials print(f'Benchmark: {average:.3f} seconds') |
Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing 1,000 one-off tasks asynchronously.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.040 seconds.
1 2 3 4 |
Trial 0 took 10.046485900878906 seconds Trial 1 took 10.041562795639038 seconds Trial 2 took 10.03199815750122 seconds Benchmark: 10.040 seconds |
ThreadPoolExecutor One-Off Async Tasks
The ThreadPoolExecutor can issue one-off asynchronous tasks via the submit() method which returns a Future object.
You can learn more about the submit() method in the tutorial:
Once issued, we can wait for all tasks to be completed by exiting the context manager block, which automatically closes the thread pool and blocks until all tasks are done.
You can learn more about shutting down the ThreadPoolExecutor in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# SuperFastPython.com # benchmark threadpoolexecutor with one-off async tasks from concurrent.futures import ThreadPoolExecutor from time import sleep from time import time # task executed in the thread pool def task(value): sleep(1) # issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPoolExecutor(100) as tp: # issue one-off tasks _ = [tp.submit(task, i) for i in range(1000)] # wait for tasks to complete # benchmark the thread pool n_trials = 3 results = list() for trial in range(n_trials): # record start time start_time = time() # perform task use_threadpool() # calculate duration duration = time() - start_time results.append(duration) print(f'Trial {trial} took {duration} seconds') # report average result average = sum(results) / n_trials print(f'Benchmark: {average:.3f} seconds') |
Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing 1,000 one-off tasks asynchronously.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.035 seconds.
1 2 3 4 |
Trial 0 took 10.036842107772827 seconds Trial 1 took 10.036179065704346 seconds Trial 2 took 10.032454013824463 seconds Benchmark: 10.035 seconds |
Comparison of Results
We can see very little difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.
Specifically, the ThreadPool took about 10.040 seconds to issue and execute 1,000 asynchronous tasks, whereas the ThreadPoolExecutor took about 10.035 seconds.
The difference was 0.005 seconds or 5 milliseconds, in favor of the ThreadPoolExecutor.
This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute and wait for many asynchronous tasks.
If anything, the results may suggest a bias toward choosing the ThreadPoolExecutor over the ThreadPool class for performance reasons.
Free Python ThreadPoolExecutor Course
Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.
Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.
ThreadPoolExecutor vs ThreadPool Batch Tasks
In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing batches of tasks and iterating their results.
ThreadPool Batch Tasks
The ThreadPool can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.
You can learn more about the map() method in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# SuperFastPython.com # benchmark threadpool with batch tasks from multiprocessing.pool import ThreadPool from time import sleep from time import time # task executed in the thread pool def task(value): sleep(1) return value # issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPool(100) as tp: # issue batch of tasks for result in tp.map(task, range(1000), chunksize=10): pass # close the pool tp.close() # wait for tasks to complete tp.join() # benchmark the thread pool n_trials = 3 results = list() for trial in range(n_trials): # record start time start_time = time() # perform task use_threadpool() # calculate duration duration = time() - start_time results.append(duration) print(f'Trial {trial} took {duration} seconds') # report average result average = sum(results) / n_trials print(f'Benchmark: {average:.3f} seconds') |
Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing a batch of 1,000 tasks, chunked into groups of 10.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.
1 2 3 4 |
Trial 0 took 10.04787564277649 seconds Trial 1 took 10.036479949951172 seconds Trial 2 took 10.031333923339844 seconds Benchmark: 10.039 seconds |
ThreadPoolExecutor Batch Tasks
The ThreadPoolExecutor can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.
You can learn more about the map() method in the tutorial:
Note, the map() method in the ThreadPoolExecutor does take a “chunksize” argument, but this has no effect.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# SuperFastPython.com # benchmark threadpoolexecutor with batch tasks from concurrent.futures import ThreadPoolExecutor from time import sleep from time import time # task executed in the thread pool def task(value): sleep(1) return value # issue tasks with thread pool def use_threadpool(): # create the thread pool with ThreadPoolExecutor(100) as tp: # issue batch of tasks for result in tp.map(task, range(1000), chunksize=10): pass # benchmark the thread pool n_trials = 3 results = list() for trial in range(n_trials): # record start time start_time = time() # perform task use_threadpool() # calculate duration duration = time() - start_time results.append(duration) print(f'Trial {trial} took {duration} seconds') # report average result average = sum(results) / n_trials print(f'Benchmark: {average:.3f} seconds') |
Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing a batch of 1,000 tasks.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.
1 2 3 4 |
Trial 0 took 10.039137125015259 seconds Trial 1 took 10.043791055679321 seconds Trial 2 took 10.035258293151855 seconds Benchmark: 10.039 seconds |
Comparison of Results
We can see no difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.
Specifically, both the ThreadPool and ThreadPoolExecutor classes took about 10.039 seconds to issue and execute and traverse the results of 1,000 tasks in batch.
This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute batches of tasks.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Recommendations
Generally, it cannot be justified to choose the ThreadPool class over the ThreadPoolExecutor class for performance reasons in the two use cases described above.
I suspect that this generalizes to other similar use cases, but to confirm, I would encourage you to update the benchmark code for your specific use cases and collect hard data (please share below if you do).
The ThreadPool and ThreadPoolExecutor classes are not identical, they offer different nuanced capabilities. You can learn more about these differences in the tutorial:
In practice, unless you need a specific capability of the ThreadPool class, I would encourage you to use the ThreadPoolExecutor class. Reasons:
- The ThreadPoolExecutor may be slightly faster in some use cases (e.g. one-off async tasks).
- The ThreadPoolExecutor uses a modern design pattern and interface that may be simpler and more convenient (e.g. reduced complexity).
- I suspect at some point in the future that the Pool and ThreadPool classes will be deprecated (e.g. future-proof).
Further Reading
This section provides additional resources that you may find helpful.
Books
- ThreadPoolExecutor Jump-Start, Jason Brownlee, (my book!)
- Concurrent Futures API Interview Questions
- ThreadPoolExecutor Class API Cheat Sheet
I also recommend specific chapters from the following books:
- Effective Python, Brett Slatkin, 2019.
- See Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPoolExecutor: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to compare the performance of the ThreadPoolExecutor and ThreadPool classes in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Jeremias Radny on Unsplash
Do you have any questions?