ThreadPoolExecutor Performance vs ThreadPool

A common justification for using the ThreadPool class over the ThreadPoolExecutor class in Python is that it is faster.

We can benchmark the performance of the ThreadPoolExecutor versus the ThreadPool on common use cases of issuing one-off asynchronous tasks and batches of tasks. Comparing the benchmark results, we find very little performance difference between the two tasks, and perhaps a slight improvement in performance when using the ThreadPoolExecutor.

The results suggest little justification for choosing the ThreadPool over the ThreadPoolExecutor for performance reasons in common use cases.

In this tutorial, you will discover how to compare the performance of the ThreadPoolExecutor and ThreadPool class in Python.

Let’s get started.

Table of Contents

Is ThreadPoolExecutor Slower Than ThreadPool?

Python provides two thread pool classes for executing and managing general tasks in a consistent way, they are the ThreadPool and the ThreadPoolExecutor.

The ThreadPool is a thread-based port of the multiprocessing.pool.Pool class that provides a pool of processes. Tasks can be issued with blocking and non-blocking asynchronous methods, either on-off e.g. apply_async() or in batches, e.g. map().

For example:

...

# create a thread pool

with ThreadPool(10) tp:

# issue one task asynchronously

result = tp.apply_async(task)

# wait for the task to complete

result.wait()

You can learn more about how to use the ThreadPool in the tutorial:

Python ThreadPool: The Complete Guide

The concurrent.futures.ThreadPoolExecutor is a modern thread pool class, added to the Python standard library much later than the Pool and ThreadPool classes. It is intended to provide a modern thread pool using the executor interface with futures to represent asynchronously executed tasks.

Tasks can be issued either one-off via the submit() method or in groups via the map() method.

For example:

...

# create the thread pool

with ThreadPoolExecutor(10) as tp:

# issue one-off tasks

future = tp.submit(task)

# wait for all tasks to complete

You can learn more about how to use the ThreadPoolExecutor in the tutorial:

Python ThreadPoolExecutor: The Complete Guide

It is generally thought that the ThreadPool class is faster than the ThreadPoolExecutor. Specifically that the ThreadPool is small and efficient and that the ThreadPoolExecutor has more checks and balances and internal classes, making it slower.

This opinion is often used to defend the adoption of the ThreadPool class over the ThreadPoolExecutor class, often with no benchmark numbers to support it.

How can we check if the newer ThreadPoolExecutor is slower or as fast as the older ThreadPool class?

Run loops using all CPUs, download your FREE book to learn how.

How to Compare ThreadPoolExecutor and ThreadPool

We can compare the general performance of the ThreadPoolExecutor class to the ThreadPool using a consistent test harness.

The performance of the thread pools can be compared directly in two use cases:

Benchmark performance of issuing and waiting on one-off asynchronous tasks.
Benchmark performance of issuing tasks in batch.

Benchmark One-Off Async Tasks

Both thread pools support the ability to issue one-off asynchronous tasks. The ThreadPool provides the apply_async() method that takes a function name and any arguments and returns a ResultAsync object. The ThreadPoolExecutor provides the submit() method that takes a function name and any arguments and returns a Future object.

We can define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of one-off async tasks (e.g. 1,000) and waits for them to complete.

For example:

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPoolExecutor(100) as tp:

# issue one-off tasks

_ = [tp.submit(task, i) for i in range(1000)]

# wait for tasks to complete

The task executed in both cases will be the same, e.g. a function that takes one argument and sleeps for a moment to simulate effort. The sleep ensures the experiment takes a non-trivial time to execute and that some tasks are blocked from immediate execution, allowing them to be managed internally within the thread pool.

For example:

# task executed in the thread pool

def task(value):

sleep(1)

Benchmark Batched Tasks

Both thread pools support the ability to issue batches of tasks. The ThreadPool and ThreadPoolExecutor classes both provide a map() method that takes a function name and an iterable of arguments to provide to each task and returns an iterable of task results.

We can again define a benchmark function that creates a thread pool with a fixed number of workers (e.g. 100), then issues a fixed number of batched tasks (e.g. 1,000) and iterates the returned results.

For example:

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPoolExecutor(100) as tp:

# issue batch of tasks

for result in tp.map(task, range(1000), chunksize=10):

pass

The task function must take an argument and return a value. As above, the task will sleep for a moment to simulate effort, to ensure that some tasks are forced to wait before execution.

For example:

# task executed in the thread pool

def task(value):

sleep(1)

return value

In both cases, the map() method takes a “chunksize” argument. This argument controls the number of tasks that are grouped together and issued to each worker in batch. This argument can be set consistently across each thread pool (e.g. an even division of tasks across workers).

You can learn more about how to configure the chunksize argument in the tutorial:

How to Configure ThreadPool map() Chunksize

Note, the map() method in the ThreadPoolExecutor does take a “chunksize” argument, but this has no effect.

Benchmarking the Use Cases

Each of the above use cases can be defined as a function that performs the activity in a given time, e.g. use_threadpool().

We can then call the function multiple times, timing the duration of each function call and then reporting the average over all trials.

For example:

# benchmark the thread pool

n_trials = 3

results = list()

for trial in range(n_trials):

# record start time

start_time = time()

# perform task

use_threadpool()

# calculate duration

duration = time() - start_time

results.append(duration)

print(f'Trial {trial} took {duration} seconds')

# report average result

average = sum(results) / n_trials

print(f'Benchmark: {average:.3f} seconds')

This general benchmark code can be the same across all classes and use cases.

This provides a general framework that you can adapt for your own use cases, to see if the choice of thread pool makes a performance difference.

Now that we have some ideas on how to benchmark the two classes, let’s explore some worked examples.

Download Now: Free ThreadPoolExecutor PDF Cheat Sheet

ThreadPoolExecutor vs ThreadPool One-Off Async Tasks

In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing one-off asynchronous tasks.

ThreadPool One-Off Async Tasks

The ThreadPool can issue one-off asynchronous tasks via the apply_async() method which returns a ResultAsync object.

You can learn more about the apply_async() method in the tutorial:

How to Use ThreadPool apply_async() in Python

Once issued, we can wait for all tasks to complete by closing and joining the thread pool.

You can learn more about joining the thread pool in the tutorial:

How to Join a ThreadPool in Python

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark threadpool with one-off async tasks

from multiprocessing.pool import ThreadPool

from time import sleep

from time import time

# task executed in the thread pool

def task(value):

sleep(1)

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPool(100) as tp:

# issue one-off tasks

_ = [tp.apply_async(task, args=(i,)) for i in range(1000)]

# close the pool

tp.close()

# wait for tasks to complete

tp.join()

# benchmark the thread pool

n_trials = 3

results = list()

for trial in range(n_trials):

# record start time

start_time = time()

# perform task

use_threadpool()

# calculate duration

duration = time() - start_time

results.append(duration)

print(f'Trial {trial} took {duration} seconds')

# report average result

average = sum(results) / n_trials

print(f'Benchmark: {average:.3f} seconds')

Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing 1,000 one-off tasks asynchronously.

This use case is then performed 3 times and the average time taken is reported, in this case about 10.040 seconds.

Trial 0 took 10.046485900878906 seconds

Trial 1 took 10.041562795639038 seconds

Trial 2 took 10.03199815750122 seconds

Benchmark: 10.040 seconds

ThreadPoolExecutor One-Off Async Tasks

The ThreadPoolExecutor can issue one-off asynchronous tasks via the submit() method which returns a Future object.

You can learn more about the submit() method in the tutorial:

map() vs. submit() With the ThreadPoolExecutor in Python

Once issued, we can wait for all tasks to be completed by exiting the context manager block, which automatically closes the thread pool and blocks until all tasks are done.

You can learn more about shutting down the ThreadPoolExecutor in the tutorial:

How to Shutdown the ThreadPoolExecutor in Python

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark threadpoolexecutor with one-off async tasks

from concurrent.futures import ThreadPoolExecutor

from time import sleep

from time import time

# task executed in the thread pool

def task(value):

sleep(1)

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPoolExecutor(100) as tp:

# issue one-off tasks

_ = [tp.submit(task, i) for i in range(1000)]

# wait for tasks to complete

# benchmark the thread pool

n_trials = 3

results = list()

for trial in range(n_trials):

# record start time

start_time = time()

# perform task

use_threadpool()

# calculate duration

duration = time() - start_time

results.append(duration)

print(f'Trial {trial} took {duration} seconds')

# report average result

average = sum(results) / n_trials

print(f'Benchmark: {average:.3f} seconds')

Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing 1,000 one-off tasks asynchronously.

This use case is then performed 3 times and the average time taken is reported, in this case about 10.035 seconds.

Trial 0 took 10.036842107772827 seconds

Trial 1 took 10.036179065704346 seconds

Trial 2 took 10.032454013824463 seconds

Benchmark: 10.035 seconds

Comparison of Results

We can see very little difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.

Specifically, the ThreadPool took about 10.040 seconds to issue and execute 1,000 asynchronous tasks, whereas the ThreadPoolExecutor took about 10.035 seconds.

The difference was 0.005 seconds or 5 milliseconds, in favor of the ThreadPoolExecutor.

This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute and wait for many asynchronous tasks.

If anything, the results may suggest a bias toward choosing the ThreadPoolExecutor over the ThreadPool class for performance reasons.

Free Python ThreadPoolExecutor Course

Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.

Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.

Learn more

ThreadPoolExecutor vs ThreadPool Batch Tasks

In this section, we will compare the general performance of the ThreadPoolExecutor and ThreadPool classes when issuing batches of tasks and iterating their results.

ThreadPool Batch Tasks

The ThreadPool can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.

You can learn more about the map() method in the tutorial:

How to Use ThreadPool map() in Python

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark threadpool with batch tasks

from multiprocessing.pool import ThreadPool

from time import sleep

from time import time

# task executed in the thread pool

def task(value):

sleep(1)

return value

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPool(100) as tp:

# issue batch of tasks

for result in tp.map(task, range(1000), chunksize=10):

pass

# close the pool

tp.close()

# wait for tasks to complete

tp.join()

# benchmark the thread pool

n_trials = 3

results = list()

for trial in range(n_trials):

# record start time

start_time = time()

# perform task

use_threadpool()

# calculate duration

duration = time() - start_time

results.append(duration)

print(f'Trial {trial} took {duration} seconds')

# report average result

average = sum(results) / n_trials

print(f'Benchmark: {average:.3f} seconds')

Running the example benchmarks the performance of the ThreadPool class with 100 workers issuing a batch of 1,000 tasks, chunked into groups of 10.

This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.

Trial 0 took 10.04787564277649 seconds

Trial 1 took 10.036479949951172 seconds

Trial 2 took 10.031333923339844 seconds

Benchmark: 10.039 seconds

ThreadPoolExecutor Batch Tasks

The ThreadPoolExecutor can issue batches of tasks via the map() method which takes a function name and iterable of arguments and returns an iterable of task results that can be traversed.

You can learn more about the map() method in the tutorial:

How to Use map() with the ThreadPoolExecutor in Python

Note, the map() method in the ThreadPoolExecutor does take a “chunksize” argument, but this has no effect.

Tying this together, the complete example is listed below.

# SuperFastPython.com

# benchmark threadpoolexecutor with batch tasks

from concurrent.futures import ThreadPoolExecutor

from time import sleep

from time import time

# task executed in the thread pool

def task(value):

sleep(1)

return value

# issue tasks with thread pool

def use_threadpool():

# create the thread pool

with ThreadPoolExecutor(100) as tp:

# issue batch of tasks

for result in tp.map(task, range(1000), chunksize=10):

pass

# benchmark the thread pool

n_trials = 3

results = list()

for trial in range(n_trials):

# record start time

start_time = time()

# perform task

use_threadpool()

# calculate duration

duration = time() - start_time

results.append(duration)

print(f'Trial {trial} took {duration} seconds')

# report average result

average = sum(results) / n_trials

print(f'Benchmark: {average:.3f} seconds')

Running the example benchmarks the performance of the ThreadPoolExecutor class with 100 workers issuing a batch of 1,000 tasks.

This use case is then performed 3 times and the average time taken is reported, in this case about 10.039 seconds.

Trial 0 took 10.039137125015259 seconds

Trial 1 took 10.043791055679321 seconds

Trial 2 took 10.035258293151855 seconds

Benchmark: 10.039 seconds

Comparison of Results

We can see no difference in the average time taken to complete this use case between the ThreadPool and ThreadPoolExecutor.

Specifically, both the ThreadPool and ThreadPoolExecutor classes took about 10.039 seconds to issue and execute and traverse the results of 1,000 tasks in batch.

This means that there is no strong justification to choose the ThreadPool over the ThreadPoolExecutor for performance reasons when using the class to execute batches of tasks.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Recommendations

Generally, it cannot be justified to choose the ThreadPool class over the ThreadPoolExecutor class for performance reasons in the two use cases described above.

I suspect that this generalizes to other similar use cases, but to confirm, I would encourage you to update the benchmark code for your specific use cases and collect hard data (please share below if you do).

The ThreadPool and ThreadPoolExecutor classes are not identical, they offer different nuanced capabilities. You can learn more about these differences in the tutorial:

ThreadPool vs ThreadPoolExecutor in Python

In practice, unless you need a specific capability of the ThreadPool class, I would encourage you to use the ThreadPoolExecutor class. Reasons:

The ThreadPoolExecutor may be slightly faster in some use cases (e.g. one-off async tasks).
The ThreadPoolExecutor uses a modern design pattern and interface that may be simpler and more convenient (e.g. reduced complexity).
I suspect at some point in the future that the Pool and ThreadPool classes will be deprecated (e.g. future-proof).

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Takeaways

You now know how to compare the performance of the ThreadPoolExecutor and ThreadPool classes in Python.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Jeremias Radny on Unsplash

ThreadPoolExecutor Performance vs ThreadPool

Is ThreadPoolExecutor Slower Than ThreadPool?

How to Compare ThreadPoolExecutor and ThreadPool

Benchmark One-Off Async Tasks

Benchmark Batched Tasks

Benchmarking the Use Cases

ThreadPoolExecutor vs ThreadPool One-Off Async Tasks

ThreadPool One-Off Async Tasks

ThreadPoolExecutor One-Off Async Tasks

Comparison of Results

ThreadPoolExecutor vs ThreadPool Batch Tasks

ThreadPool Batch Tasks

ThreadPoolExecutor Batch Tasks

Comparison of Results

Recommendations

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

How Well Do You Know Concurrent Futures?

Test Your Skill:

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn the ThreadPoolExecutor Systematically

Additional menu

Is ThreadPoolExecutor Slower Than ThreadPool?

How to Compare ThreadPoolExecutor and ThreadPool

Benchmark One-Off Async Tasks

Benchmark Batched Tasks

Benchmarking the Use Cases

ThreadPoolExecutor vs ThreadPool One-Off Async Tasks

ThreadPool One-Off Async Tasks

ThreadPoolExecutor One-Off Async Tasks

Comparison of Results

ThreadPoolExecutor vs ThreadPool Batch Tasks

ThreadPool Batch Tasks

ThreadPoolExecutor Batch Tasks

Comparison of Results

Recommendations

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Do you have any questions?Cancel reply

Footer

Learn the ThreadPoolExecutor Systematically