ThreadPoolExecutor Performance vs ThreadPool
A common justification for using the ThreadPool class over the ThreadPoolExecutor class in Python is that it is faster.
We can benchmark the performance of the ThreadPoolExecutor versus the ThreadPool on common use cases of issuing one-off asynchronous tasks and batches of tasks. Comparing the benchmark results, we find very little performance difference between the two tasks, and perhaps a slight improvement in performance when using the ThreadPoolExecutor.
The results suggest little justification for choosing the ThreadPool over the ThreadPoolExecutor for performance reasons in common use cases.
In this tutorial, you will discover how to compare the performance of the ThreadPoolExecutor and ThreadPool class in Python.
Let's get started.
Is ThreadPoolExecutor Slower Than ThreadPool?
Python provides two thread pool classes for executing and managing general tasks in a consistent way, they are the ThreadPool and the ThreadPoolExecutor.
The ThreadPool is a thread-based port of the multiprocessing.pool.Pool class that provides a pool of processes. Tasks can be issued with blocking and non-blocking asynchronous methods, either on-off e.g. apply_async() or in batches, e.g. map().
For example:
...
# create a thread pool
with ThreadPool(10) tp:
# issue one task asynchronously
result = tp.apply_async(task)
# wait for the task to complete
result.wait()
You can learn more about how to use the ThreadPool in the tutorial:
The concurrent.futures.ThreadPoolExecutor is a modern thread pool class, added to the Python standard library much later than the Pool and ThreadPool classes. It is intended to provide a modern thread pool using the executor interface with futures to represent asynchronously executed tasks.
Tasks can be issued either one-off via the submit() method or in groups via the map() method.
For example:
...
# create the thread pool
with ThreadPoolExecutor(10) as tp:
# issue one-off tasks
future = tp.submit(task)
# wait for all tasks to complete
You can learn more about how to use the ThreadPoolExecutor in the tutorial:
It is generally thought that the ThreadPool class is faster than the ThreadPoolExecutor. Specifically that the ThreadPool is small and efficient and that the ThreadPoolExecutor has more checks and balances and internal classes, making it slower.
This opinion is often used to defend the adoption of the ThreadPool class over the ThreadPoolExecutor class, often with no benchmark numbers to support it.
How can we check if the newer ThreadPoolExecutor is slower or as fast as the older ThreadPool class?
How to Compare ThreadPoolExecutor and ThreadPool
We can compare the general performance of the ThreadPoolExecutor class to the ThreadPool using a consistent test harness.
The performance of the thread pools can be compared directly in two use cases:
- Benchmark performance of issuing and waiting on one-off asynchronous tasks.
- Benchmark performance of issuing tasks in batch.
Benchmark One-Off Async Tasks
Both thread pools support the ability to issue one-off asynchronous tasks. The ThreadPool provides the apply_async() method that takes a function name and any arguments and returns a ResultAsync object. The ThreadPoolExecutor provides the submit() method that takes a function name and any arguments and returns a Future object.
We can define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of one-off async tasks (e.g. 1,000) and waits for them to complete.
For example:
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPoolExecutor(100) as tp:
# issue one-off tasks
_ = [tp.submit(task, i) for i in range(1000)]
# wait for tasks to complete
The task executed in both cases will be the same, e.g. a function that takes one argument and sleeps for a moment to simulate effort. The sleep ensures the experiment takes a non-trivial time to execute and that some tasks are blocked from immediate execution, allowing them to be managed internally within the thread pool.
For example:
# task executed in the thread pool
def task(value):
sleep(1)
Benchmark Batched Tasks
Both thread pools support the ability to issue batches of tasks. The ThreadPool and ThreadPoolExecutor classes both provide a map() method that takes a function name and an iterable of arguments to provide to each task and returns an iterable of task results.
We can again define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of batched tasks (e.g. 1,000) and iterates the returned results.
For example:
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPoolExecutor(100) as tp:
# issue batch of tasks
for result in tp.map(task, range(1000), chunksize=10):
pass
The task function must take an argument and return a value. As above, the task will sleep for a moment to simulate effort, to ensure that some tasks are forced to wait before execution.
For example:
# task executed in the thread pool
def task(value):
sleep(1)
return value
In both cases, the map() method takes a "chunksize" argument. This argument controls the number of tasks that are grouped together and issued to each worker in batch. This argument can be set consistently across each thread pool (e.g. an even division of tasks across workers).
You can learn more about how to configure the chunksize argument in the tutorial:
Note, the map() method in the ThreadPoolExecutor does take a "chunksize" argument, but this has no effect.
Benchmarking the Use Cases
Each of the above use cases can be defined as a function that performs the activity in a given time, e.g. use_threadpool().
We can then call the function multiple times, timing the duration of each function call and then reporting the average over all trials.
For example:
# benchmark the thread pool
n_trials = 3
results = list()
for trial in range(n_trials):
# record start time
start_time = time()
# perform task
use_threadpool()
# calculate duration
duration = time() - start_time
results.append(duration)
print(f'Trial {trial} took {duration} seconds')
# report average result
average = sum(results) / n_trials
print(f'Benchmark: {average:.3f} seconds')
This general benchmark code can be the same across all classes and use cases.
This provides a general framework that you can adapt for your own use cases, to see if the choice of thread pool makes a performance difference.
Now that we have some ideas on how to benchmark the two classes, let's explore some worked examples.
ThreadPoolExecutor vs ThreadPool One-Off Async Tasks
In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing one-off asynchronous tasks.
ThreadPool One-Off Async Tasks
The ThreadPool can issue one-off asynchronous tasks via the apply_async() method which returns a ResultAsync object.
You can learn more about the apply_async() method in the tutorial:
Once issued, we can wait for all tasks to complete by closing and joining the thread pool.
You can learn more about joining the thread pool in the tutorial:
Tying this together, the complete example is listed below.
# SuperFastPython.com
# benchmark threadpool with one-off async tasks
from multiprocessing.pool import ThreadPool
from time import sleep
from time import time
# task executed in the thread pool
def task(value):
sleep(1)
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPool(100) as tp:
# issue one-off tasks
_ = [tp.apply_async(task, args=(i,)) for i in range(1000)]
# close the pool
tp.close()
# wait for tasks to complete
tp.join()
# benchmark the thread pool
n_trials = 3
results = list()
for trial in range(n_trials):
# record start time
start_time = time()
# perform task
use_threadpool()
# calculate duration
duration = time() - start_time
results.append(duration)
print(f'Trial {trial} took {duration} seconds')
# report average result
average = sum(results) / n_trials
print(f'Benchmark: {average:.3f} seconds')
Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing 1,000 one-off tasks asynchronously.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.040 seconds.
Trial 0 took 10.046485900878906 seconds
Trial 1 took 10.041562795639038 seconds
Trial 2 took 10.03199815750122 seconds
Benchmark: 10.040 seconds
ThreadPoolExecutor One-Off Async Tasks
The ThreadPoolExecutor can issue one-off asynchronous tasks via the submit() method which returns a Future object.
You can learn more about the submit() method in the tutorial:
Once issued, we can wait for all tasks to be completed by exiting the context manager block, which automatically closes the thread pool and blocks until all tasks are done.
You can learn more about shutting down the ThreadPoolExecutor in the tutorial:
Tying this together, the complete example is listed below.
# SuperFastPython.com
# benchmark threadpoolexecutor with one-off async tasks
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from time import time
# task executed in the thread pool
def task(value):
sleep(1)
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPoolExecutor(100) as tp:
# issue one-off tasks
_ = [tp.submit(task, i) for i in range(1000)]
# wait for tasks to complete
# benchmark the thread pool
n_trials = 3
results = list()
for trial in range(n_trials):
# record start time
start_time = time()
# perform task
use_threadpool()
# calculate duration
duration = time() - start_time
results.append(duration)
print(f'Trial {trial} took {duration} seconds')
# report average result
average = sum(results) / n_trials
print(f'Benchmark: {average:.3f} seconds')
Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing 1,000 one-off tasks asynchronously.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.035 seconds.
Trial 0 took 10.036842107772827 seconds
Trial 1 took 10.036179065704346 seconds
Trial 2 took 10.032454013824463 seconds
Benchmark: 10.035 seconds
Comparison of Results
We can see very little difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.
Specifically, the ThreadPool took about 10.040 seconds to issue and execute 1,000 asynchronous tasks, whereas the ThreadPoolExecutor took about 10.035 seconds.
The difference was 0.005 seconds or 5 milliseconds, in favor of the ThreadPoolExecutor.
This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute and wait for many asynchronous tasks.
If anything, the results may suggest a bias toward choosing the ThreadPoolExecutor over the ThreadPool class for performance reasons.
ThreadPoolExecutor vs ThreadPool Batch Tasks
In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing batches of tasks and iterating their results.
ThreadPool Batch Tasks
The ThreadPool can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.
You can learn more about the map() method in the tutorial:
Tying this together, the complete example is listed below.
# SuperFastPython.com
# benchmark threadpool with batch tasks
from multiprocessing.pool import ThreadPool
from time import sleep
from time import time
# task executed in the thread pool
def task(value):
sleep(1)
return value
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPool(100) as tp:
# issue batch of tasks
for result in tp.map(task, range(1000), chunksize=10):
pass
# close the pool
tp.close()
# wait for tasks to complete
tp.join()
# benchmark the thread pool
n_trials = 3
results = list()
for trial in range(n_trials):
# record start time
start_time = time()
# perform task
use_threadpool()
# calculate duration
duration = time() - start_time
results.append(duration)
print(f'Trial {trial} took {duration} seconds')
# report average result
average = sum(results) / n_trials
print(f'Benchmark: {average:.3f} seconds')
Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing a batch of 1,000 tasks, chunked into groups of 10.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.
Trial 0 took 10.04787564277649 seconds
Trial 1 took 10.036479949951172 seconds
Trial 2 took 10.031333923339844 seconds
Benchmark: 10.039 seconds
ThreadPoolExecutor Batch Tasks
The ThreadPoolExecutor can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.
You can learn more about the map() method in the tutorial:
Note, the map() method in the ThreadPoolExecutor does take a "chunksize" argument, but this has no effect.
Tying this together, the complete example is listed below.
# SuperFastPython.com
# benchmark threadpoolexecutor with batch tasks
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from time import time
# task executed in the thread pool
def task(value):
sleep(1)
return value
# issue tasks with thread pool
def use_threadpool():
# create the thread pool
with ThreadPoolExecutor(100) as tp:
# issue batch of tasks
for result in tp.map(task, range(1000), chunksize=10):
pass
# benchmark the thread pool
n_trials = 3
results = list()
for trial in range(n_trials):
# record start time
start_time = time()
# perform task
use_threadpool()
# calculate duration
duration = time() - start_time
results.append(duration)
print(f'Trial {trial} took {duration} seconds')
# report average result
average = sum(results) / n_trials
print(f'Benchmark: {average:.3f} seconds')
Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing a batch of 1,000 tasks.
This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.
Trial 0 took 10.039137125015259 seconds
Trial 1 took 10.043791055679321 seconds
Trial 2 took 10.035258293151855 seconds
Benchmark: 10.039 seconds
Comparison of Results
We can see no difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.
Specifically, both the ThreadPool and ThreadPoolExecutor classes took about 10.039 seconds to issue and execute and traverse the results of 1,000 tasks in batch.
This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute batches of tasks.
Recommendations
Generally, it cannot be justified to choose the ThreadPool class over the ThreadPoolExecutor class for performance reasons in the two use cases described above.
I suspect that this generalizes to other similar use cases, but to confirm, I would encourage you to update the benchmark code for your specific use cases and collect hard data (please share below if you do).
The ThreadPool and ThreadPoolExecutor classes are not identical, they offer different nuanced capabilities. You can learn more about these differences in the tutorial:
In practice, unless you need a specific capability of the ThreadPool class, I would encourage you to use the ThreadPoolExecutor class. Reasons:
- The ThreadPoolExecutor may be slightly faster in some use cases (e.g. one-off async tasks).
- The ThreadPoolExecutor uses a modern design pattern and interface that may be simpler and more convenient (e.g. reduced complexity).
- I suspect at some point in the future that the Pool and ThreadPool classes will be deprecated (e.g. future-proof).
Takeaways
You now know how to compare the performance of the ThreadPoolExecutor and ThreadPool classes in Python.
If you enjoyed this tutorial, you will love my book: Python ThreadPoolExecutor Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.