Last Updated on September 12, 2022
It is important to follow best practices when using the ThreadPoolExecutor in Python.
Best practices allow you to side-step the most common errors and bugs when using thread for asynchronous tasks in your programs.
In this tutorial, you will discover the best practices when using Python thread pools.
Let’s get started.
ThreadPoolExecutor Best Practices
The ThreadPoolExecutor is a flexible and powerful thread pool for executing ad hoc tasks in an asynchronous manner.
Once you know how the ThreadPoolExecutor works, it is important to review some best practices to consider when bringing thread pools into our Python programs.
To keep things simple, there are five best practices when using the ThreadPoolExecutor; they are:
- Use the Context Manager
- Use map() for Asynchronous For-Loops
- Use submit() with as_completed()
- Use Independent Functions as Tasks
- Use for IO-Bound Tasks (probably)
Let’s get started with the first practice, which is to use the context manager.
Run loops using all CPUs, download your FREE book to learn how.
Use the Context Manager
Use the context manager when using thread pools and handle all task dispatching to the thread pool and processing results within the manager.
For example:
1 2 3 4 |
... # create a thread pool via the context manager with ThreadPoolExecutor(10) as executor: # ... |
Remember to configure your thread pool when creating it in the context manager, specifically by setting the number of threads to use in the pool.
Using the context manager avoids the situation where you have explicitly instantiated the thread pool and forget to shut it down manually by calling shutdown().
It is also less code and better grouped than managing instantiation and shutdown manually; for example:
1 2 3 4 5 |
... # create a thread pool manually executor = ThreadPoolExecutor(10) # ... executor.shutdown() |
Do not use the context manager when you need to dispatch tasks and get results over a broader context (e.g. multiple functions) and/or when you have more control over the shutdown of the pool.
Use map() for Asynchronous For-Loops
If you have a for-loop that applies a function to each item in a list, then use the map() function to dispatch the tasks asynchronously.
For example, you may have a for-loop over a list that calls myfunc() for each item:
1 2 3 4 5 |
... # apply a function to each item in an iterable for item in mylist: result = myfunc(item) # do something... |
Or, you may already be using the built-in map function:
1 2 3 4 |
... # apply a function to each item in an iterable for result in map(myfinc, mylist): # do something... |
Both of these cases can be made asynchronous using the map() function on the thread pool.
1 2 3 4 |
... # apply a function to each item in a iterable asynchronously for result in executor.map(myfunc, mylist): # do something... |
You can learn more about how to use the map() function here:
Do not use the map() function if your target task function has side effects.
Do not use the map() function if your target task function has no arguments or more than one argument.
Do not use the map() function if you need control over exception handling for each task, or if you would like to get results to tasks in the order that tasks are completed.
Free Python ThreadPoolExecutor Course
Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.
Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.
Use submit() with as_completed()
If you would like to process results in the order that tasks are completed, rather than the order that tasks are submitted, then use submit() and as_completed().
The submit() function is on the thread pool and is used to push tasks into the pool for execution and returns immediately with a Future object for the task. The as_completed() function is a module method that will take an iterable of Future objects, like a list, and will return Future objects as the tasks are completed.
For example:
1 2 3 4 5 6 7 8 |
... # submit all tasks and get future objects futures = [executor.submit(myfunc, item) for item in mylist] # process results from tasks in order of task completion for future in as_completed(futures): # get the result result = future.result() # do something... |
Do not use the submit() and as_completed() combination if you need to process the results in the order that the tasks were submitted to the thread pool.
Do not use the submit() and as_completed() combination if you need results from all tasks to continue; you may be better off using the wait() module function.
Do not use the submit() and as_completed() combination for a simple asynchronous for-loop; you may be better off using map().
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Use Independent Functions as Tasks
Use the ThreadPoolExecutor if your tasks are independent.
This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.
The ThreadPoolExecutor is ideal for tasks that do not change any data, e.g. have no side effects, so-called pure functions.
Thread pools can be organized into data flows and pipelines for linear dependence between tasks, perhaps with one thread pool per task type.
The thread pool is not designed for tasks that require coordination; you should consider using the Thread class and coordination patterns like the Barrier and Semaphore.
Thread pools are not designed for tasks that require synchronization; you should consider using the Thread class and locking patterns like Lock and RLock.
Use for IO-Bound Tasks (probably)
Use ThreadPoolExecutor for IO-bound tasks only.
These are tasks that may involve interacting with an external device, such as a peripheral (e.g. a camera or a printer), a storage device (e.g. a storage device or a hard drive), or another computer (e.g. socket communication).
Threads and thread pools like the ThreadPoolExecutor are probably not appropriate for CPU-bound tasks, like computation on data in memory.
This is because of design decisions within the Python interpreter that makes use of a master lock called the Global Interpreter Lock (GIL) that prevents more than one Python instruction from executing at the same time.
This design decision was made within the reference implementation of the Python interpreter (CPython) but may not impact other interpreters (such as PyPy, Iron Python, and Jython).
Further Reading
This section provides additional resources that you may find helpful.
Books
- ThreadPoolExecutor Jump-Start, Jason Brownlee, (my book!)
- Concurrent Futures API Interview Questions
- ThreadPoolExecutor Class API Cheat Sheet
I also recommend specific chapters from the following books:
- Effective Python, Brett Slatkin, 2019.
- See Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPoolExecutor: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know the best practices when using the ThreadPoolExecutor in Python.
Do you have any questions about the best practices?
Ask your question in the comments below and I will do my best to answer.
Photo by Jacek Dylag on Unsplash
Do you have any questions?