Why Is the ThreadPool Slower in Python?

Last Updated on October 29, 2022

You can make your program slower by using the ThreadPool in Python.

In this tutorial, you will discover the anti-pattern for using the ThreadPool and how to avoid it on your projects.

Let’s get started.

Table of Contents

ThreadPool Can Be Slower Than a For Loop

The multiprocessing.pool.ThreadPool in Python provides a pool of reusable threads for executing ad hoc tasks.

A thread pool object which controls a pool of worker threads to which jobs can be submitted.
— multiprocessing — Process-based parallelism

The ThreadPool class extends the Pool class. The Pool class provides a pool of worker processes for process-based concurrency.

Although the ThreadPool class is in the multiprocessing module it offers thread-based concurrency and is best suited to IO-bound tasks, such as reading or writing from sockets or files.

A ThreadPool can be configured when it is created, which will prepare the new threads.

We can issue one-off tasks to the ThreadPool using methods such as apply() or we can apply the same function to an iterable of items using methods such as map().

Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the methods such as apply_async() and map_async().

The ThreadPool is designed to speed up your program by executing tasks concurrently.

Nevertheless, in some use cases, using the ThreadPool can make your program slower. Sometimes dramatically slower than performing the same task in a for loop.

How can the ThreadPool make your program slower?

Run loops using all CPUs, download your FREE book to learn how.

ThreadPool Can Be Slower for CPU-Bound Tasks

Using the ThreadPool for a CPU-bound task can be slower than not using it.

This is because Python threads are constrained by the Global Interpreter Lock, or GIL.

The GIL is a programming pattern in the reference Python interpreter (CPython) that uses synchronization to ensure that only one thread can execute instructions at a time within a Python process.

This means that although we may have multiple threads in the thread pool, only one thread can execute at a time.

This is fine when the tasks executed by the thread pool are blocking, such as IO-bound tasks that might read from a file or internet connection.

This is a problem when the tasks executed by the thread pool are CPU-bound, meaning that the speed of their execution is determined by the speed of the CPU. These tasks do not block and therefore run as fast as possible. Because of the GIL, the threads executing these tasks will run one at a time and step on each other via context switching.

Context switching is a programming pattern that allows more than one thread of execution to run on one CPU, e.g. changes the “context” for the CPU that executes instructions. In a context switch, the operating system will store the state of the thread that is executing so that it can be resumed later, and allows another thread of execution to run and stores its state.

This is a problem with CPU-bound tasks because context switching is a relatively expensive operation. Having many threads running at the same time with the same priority on the same type of task will likely force the operating system to context switch between them often, introducing unnecessary computational overhead.

The result is the overall task will likely be slower when executing it with the ThreadPool compared to executing it directly in a for loop.

Given that executing CPU-bound tasks with the ThreadPool will likely result in the same or worse performance, we might refer to this usage as an anti-pattern. That is a ThreadPool of usage that can be easily identified and must be avoided, e.g. a bad solution to the problem of concurrency in Python.

Using the ThreadPool for CPU-bound Tasks is an Anti-pattern.

The multiprocessing.pool.Pool should probably be used instead for CPU-bound tasks. This is because it uses processes instead of threads, and as such, it is not constrained by the GIL.

Now that we know why using a ThreadPool can be slower than a for loop in some cases, let’s look at a worked example.

Download Now: Free ThreadPool PDF Cheat Sheet

Example of ThreadPool Being Slower Than a For Loop

Let’s look at an example where using a ThreadPool can be slower than a for loop.

CPU-Bound Tasks in a For Loop

First, let’s define a simple CPU-bound task to execute many times.

In this case, we can square a number. That is, given a numeric input, return the squared value.

# perform some math operation

def operation(value):

return value**2

Next, let’s perform this operation many times, such as one million (10,000,000) times, and report a message when we are done. That is, we will square the numbers from zero to 9,999,999.

We can use a list comprehension, which is a pythonic for-loop.

...

# perform a math operation many times

values = [operation(i) for i in range(10000000)]

print('done')

This could just as easily be written as a for loop directly; for example:

...

# perform a math operation many times

values = list()

for i in range(10000000):

values.append(operation(i))

print('done')

Or using the map() function, which too might be more pythonic; for example:

...

# perform a math operation many times

values = list(map(operation, range(10000000)))

print('done')

We’ll stick with the list comprehension. The complete example is listed below.

# SuperFastPython.com

# example of performing a simple math task many times in a for loop

# perform some math operation

def operation(value):

return value**2

# protect the entry point

if __name__ == '__main__':

# perform a math operation many times

# values = [operation(i) for i in range(1000000)]

values = [operation(i) for i in range(10000000)]

print('done')

The code runs fast, completing in about 3.2 seconds on my system.

How long does it take to run on your system?
Let me know in the comments below.

Next, let’s make the task concurrent using the ThreadPool.

CPU-Bound Tasks in ThreadPool

We can update the code from the previous example to use the ThreadPool.

This would be an anti-pattern as described previously, therefore, we would expect this example to run as fast as pr slower than the for loop version because of the overhead of context switching.

First, we can create a thread pool with some number of threads, in this case, 4. We can then use the map() method on the ThreadPool to submit the tasks into the thread pool. Each task will be to square a number, with numbers from 0 to 9,999,999 sent into the pool for execution.

...

# perform a math operation many times

with ThreadPool(4) as pool:

results = pool.map(operation, range(10000000))

The complete example is listed below.

# SuperFastPython.com

# example of how using the thread pool can be slower, an anti-pattern

from multiprocessing.pool import ThreadPool

# perform some math operation

def operation(value):

return value**2

# protect the entry point

if __name__ == '__main__':

# perform a math operation many times

with ThreadPool(4) as pool:

results = pool.map(operation, range(10000000))

print('done')

Running the example squares all of the numbers as before.

On my system, it takes about 3.3 seconds to complete compared to the 3.2 seconds taken with the for loop.

This is slightly slower, 100 milliseconds slower, or roughly the same speed offering no benefit.

Free Python ThreadPool Course

Download your FREE ThreadPool PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPool API.

Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously

Learn more

Why is The ThreadPool not a lot slower?

The reason is that by default the map() uses a “chunksize” argument value other than 1.

Recall that the “chunksize” argument for the map() method controls the mapping of issued tasks to internal tasks transmitted to worker threads for execution. It allows issued function calls to be grouped into batches called chunks for execution, which offers a large computational benefit.

If “chunksize” is set to 1, meaning one task per function call, the example runs a lot slower.

For example:

# SuperFastPython.com

# example of how using the thread pool can be slower, an anti-pattern

from multiprocessing.pool import ThreadPool

# perform some math operation

def operation(value):

return value**2

# protect the entry point

if __name__ == '__main__':

# perform a math operation many times

with ThreadPool(4) as pool:

results = pool.map(operation, range(10000000), chunksize=1)

print('done')

Running the example squares all of the numbers as before, but it is dramatically slower.

On my system, it takes about 44.3 seconds to complete, compared to 3.2 seconds for the for-loop version.

That is 41.1 seconds longer or 13.8 times slower.

Again, the reason for the lack of improvement is because the task is a CPU-bound task and the ThreadPool uses threads that are subjected to the GIL, meaning only one thread can execute at a time and the operating system will context switch between them, adding a large amount of overhead to the overall task.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Common Questions

This section answers some commonly asked questions about this example.

What If We Use More Threads in the ThreadPool?

Using more threads will not improve the performance for the same reason that using 4 threads does not speed up the execution of the tasks.

The GIL ensures only one thread executes instructions at a time and the operating system will context switch between the tasks, adding significant overhead to the overall task.

What If We Use One Thread in the ThreadPool?

The ThreadPool will still be slower than the for loop even if the thread pool had one thread.

The reason is because of all of the additional overhead in the thread pool for packaging up each task using internal classes and additional function calls as the task bounces around inside the thread pool for execution.

This is the reason why even using a multiprocessing Pool for such a simple task would likely not provide a speed up, at least for this example.

Will the Worker Threads in the ThreadPool Run on Different Cores?

Probably not.

The operating system determines what code will run on each CPU core in your system.

Because only one thread can execute at a time in this example, it is very likely that a single CPU core would be used.

Will Using a Multiprocessing Pool Speed Up This Example?

Maybe.

The operation is very simple and we are performing it millions of times.

If we push these tasks into a multiprocessing Pool using map(), we will first have to tune the chunksize argument that defines the mapping of tasks we submit to internal tasks serialized and transmitted to the worker processes in the pool.

Then executing the tasks in the process pool will add overhead, firstly for the additional wrapping of the tasks using internal objects and the additional function calls that need to be made, and secondly for the inter-process communication needed to share the tasks and their results between processes.

The example below demonstrates the same example using the multiprocessing Pool.

# SuperFastPython.com

# example of performing a simple math task concurrently

from multiprocessing.pool import Pool

# perform some math operation

def operation(value):

return value**2

# protect the entry point

if __name__ == '__main__':

# perform a math operation many times

with Pool(4) as pool:

results = pool.map(operation, range(1000000))

print('done')

Running the example does offer a speed-up.

It completes in about 1.7 seconds, compared to 3.2 without the Pool and 3.3 with the ThreadPool.

That is 1.5 seconds faster than no Pool or about 1.8 times faster.

Further benefits could be achieved by tuning the “chunksize” argument for the map() method.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Takeaways

You now know how ThreadPool can make your programs slower and how to avoid it.

Do you have any questions about how Python thread pools can be slower?
Ask your questions in the comments below and I will do my best to answer.

Photo by Zherui Zhang on Unsplash

Why Is the ThreadPool Slower in Python?

ThreadPool Can Be Slower Than a For Loop

ThreadPool Can Be Slower for CPU-Bound Tasks

Example of ThreadPool Being Slower Than a For Loop

CPU-Bound Tasks in a For Loop

CPU-Bound Tasks in ThreadPool

Why is The ThreadPool not a lot slower?

Common Questions

What If We Use More Threads in the ThreadPool?

What If We Use One Thread in the ThreadPool?

Will the Worker Threads in the ThreadPool Run on Different Cores?

Will Using a Multiprocessing Pool Speed Up This Example?

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn the ThreadPool Class Systematically

Additional menu

ThreadPool Can Be Slower Than a For Loop

ThreadPool Can Be Slower for CPU-Bound Tasks

Example of ThreadPool Being Slower Than a For Loop

CPU-Bound Tasks in a For Loop

CPU-Bound Tasks in ThreadPool

Why is The ThreadPool not a lot slower?

Common Questions

What If We Use More Threads in the ThreadPool?

What If We Use One Thread in the ThreadPool?

Will the Worker Threads in the ThreadPool Run on Different Cores?

Will Using a Multiprocessing Pool Speed Up This Example?

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn the ThreadPool Class Systematically