ThreadPoolExecutor Best Practices in Python

Last Updated on September 12, 2022

It is important to follow best practices when using the ThreadPoolExecutor in Python.

Best practices allow you to side-step the most common errors and bugs when using thread for asynchronous tasks in your programs.

In this tutorial, you will discover the best practices when using Python thread pools.

Let’s get started.

Table of Contents

ThreadPoolExecutor Best Practices

The ThreadPoolExecutor is a flexible and powerful thread pool for executing ad hoc tasks in an asynchronous manner.

Once you know how the ThreadPoolExecutor works, it is important to review some best practices to consider when bringing thread pools into our Python programs.

To keep things simple, there are five best practices when using the ThreadPoolExecutor; they are:

Use the Context Manager
Use map() for Asynchronous For-Loops
Use submit() with as_completed()
Use Independent Functions as Tasks
Use for IO-Bound Tasks (probably)

Let’s get started with the first practice, which is to use the context manager.

Run loops using all CPUs, download your FREE book to learn how.

Use the Context Manager

Use the context manager when using thread pools and handle all task dispatching to the thread pool and processing results within the manager.

For example:

...

# create a thread pool via the context manager

with ThreadPoolExecutor(10) as executor:

# ...

Remember to configure your thread pool when creating it in the context manager, specifically by setting the number of threads to use in the pool.

Using the context manager avoids the situation where you have explicitly instantiated the thread pool and forget to shut it down manually by calling shutdown().

It is also less code and better grouped than managing instantiation and shutdown manually; for example:

...

# create a thread pool manually

executor = ThreadPoolExecutor(10)

# ...

executor.shutdown()

Do not use the context manager when you need to dispatch tasks and get results over a broader context (e.g. multiple functions) and/or when you have more control over the shutdown of the pool.

Download Now: Free ThreadPoolExecutor PDF Cheat Sheet

Use map() for Asynchronous For-Loops

If you have a for-loop that applies a function to each item in a list, then use the map() function to dispatch the tasks asynchronously.

For example, you may have a for-loop over a list that calls myfunc() for each item:

...

# apply a function to each item in an iterable

for item in mylist:

result = myfunc(item)

# do something...

Or, you may already be using the built-in map function:

...

# apply a function to each item in an iterable

for result in map(myfinc, mylist):

# do something...

Both of these cases can be made asynchronous using the map() function on the thread pool.

...

# apply a function to each item in a iterable asynchronously

for result in executor.map(myfunc, mylist):

# do something...

You can learn more about how to use the map() function here:

How to Use map() with the ThreadPoolExecutor in Python

Do not use the map() function if your target task function has side effects.

Do not use the map() function if your target task function has no arguments or more than one argument.

Do not use the map() function if you need control over exception handling for each task, or if you would like to get results to tasks in the order that tasks are completed.

Free Python ThreadPoolExecutor Course

Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.

Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.

Learn more

Use submit() with as_completed()

If you would like to process results in the order that tasks are completed, rather than the order that tasks are submitted, then use submit() and as_completed().

The submit() function is on the thread pool and is used to push tasks into the pool for execution and returns immediately with a Future object for the task. The as_completed() function is a module method that will take an iterable of Future objects, like a list, and will return Future objects as the tasks are completed.

For example:

...

# submit all tasks and get future objects

futures = [executor.submit(myfunc, item) for item in mylist]

# process results from tasks in order of task completion

for future in as_completed(futures):

# get the result

result = future.result()

# do something...

Do not use the submit() and as_completed() combination if you need to process the results in the order that the tasks were submitted to the thread pool.

Do not use the submit() and as_completed() combination if you need results from all tasks to continue; you may be better off using the wait() module function.

Do not use the submit() and as_completed() combination for a simple asynchronous for-loop; you may be better off using map().

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Use Independent Functions as Tasks

Use the ThreadPoolExecutor if your tasks are independent.

This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.

The ThreadPoolExecutor is ideal for tasks that do not change any data, e.g. have no side effects, so-called pure functions.

Thread pools can be organized into data flows and pipelines for linear dependence between tasks, perhaps with one thread pool per task type.

The thread pool is not designed for tasks that require coordination; you should consider using the Thread class and coordination patterns like the Barrier and Semaphore.

Thread pools are not designed for tasks that require synchronization; you should consider using the Thread class and locking patterns like Lock and RLock.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Use for IO-Bound Tasks (probably)

Use ThreadPoolExecutor for IO-bound tasks only.

These are tasks that may involve interacting with an external device, such as a peripheral (e.g. a camera or a printer), a storage device (e.g. a storage device or a hard drive), or another computer (e.g. socket communication).

Threads and thread pools like the ThreadPoolExecutor are probably not appropriate for CPU-bound tasks, like computation on data in memory.

This is because of design decisions within the Python interpreter that makes use of a master lock called the Global Interpreter Lock (GIL) that prevents more than one Python instruction from executing at the same time.

This design decision was made within the reference implementation of the Python interpreter (CPython) but may not impact other interpreters (such as PyPy, Iron Python, and Jython).

Takeaways

You now know the best practices when using the ThreadPoolExecutor in Python.

Do you have any questions about the best practices?
Ask your question in the comments below and I will do my best to answer.

Photo by Jacek Dylag on Unsplash

ThreadPoolExecutor Best Practices in Python

ThreadPoolExecutor Best Practices

Use the Context Manager

Use map() for Asynchronous For-Loops

Use submit() with as_completed()

Use Independent Functions as Tasks

Use for IO-Bound Tasks (probably)

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

How Well Do You Know Concurrent Futures?

Test Your Skill:

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn the ThreadPoolExecutor Systematically

Additional menu

ThreadPoolExecutor Best Practices

Use the Context Manager

Use map() for Asynchronous For-Loops

Use submit() with as_completed()

Use Independent Functions as Tasks

Use for IO-Bound Tasks (probably)

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn the ThreadPoolExecutor Systematically