ThreadPool Best Practices in Python
It is important to follow best practices when using the ThreadPool in Python.
Best practices allow you to side-step the most common errors and bugs when using threads to execute ad hoc tasks in your programs.
In this tutorial, you will discover the best practices when using ThreadPool in Python.
Let's get started.
ThreadPool Best Practices
The ThreadPool is a flexible and powerful thread pool for executing ad hoc tasks in a synchronous or asynchronous manner.
Once you know how the ThreadPool works, it is important to review some best practices to consider when bringing the ThreadPool into our Python programs.
To keep things simple, there are 6 best practices when using the ThreadPool, they are:
- Use the Context Manager
- Use map() for Concurrent For-Loops
- Use imap_unordered() For Responsive Code
- Use map_async() to Issue Tasks Asynchronously
- Use Independent Functions as Tasks
- Use for IO-Bound Tasks
Let's get started with the first practice, which is to use the context manager.
Use the Context Manager
Use the context manager when using the ThreadPool to ensure the pool is always closed correctly.
For example:
...
# create a thread pool via the context manager
with ThreadPool(4) as pool:
# ...
Remember to configure your ThreadPool when creating it in the context manager, specifically by setting the number of thread workers to use in the pool.
Using the context manager avoids the situation where you have explicitly instantiated the ThreadPool and forget to shut it down manually by calling close() or terminate().
It is also less code and better grouped than managing instantiation and shutdown manually, for example:
...
# create a thread pool manually
executor = ThreadPool(4)
# ...
executor.close()
Don't use the context manager when you need to dispatch tasks and get results over a broader context (e.g. multiple functions) and/or when you have more control over the shutdown of the pool.
You can learn more about how to use the ThreadPool context manager in the tutorial:
Use map() for Concurrent For-Loops
If you have a for-loop that applies a function to each item in a list or iterable, then use the map() function to dispatch all tasks and handle results once all tasks are completed.
For example, you may have a for-loop over a list that calls task() for each item:
...
# apply a function to each item in an iterable
for item in mylist:
result = task(item)
# do something...
Or, you may already be using the built-in map() function:
...
# apply a function to each item in an iterable
for result in map(task, mylist):
# do something...
Both of these cases can be made concurrent using the map() function on the ThreadPool.
...
# apply a function to each item in an iterable concurrently
for result in pool.map(task, mylist):
# do something...
Probably do not use the map() function if your target task function has side effects.
Do not use the map() function if your target task function has no arguments or more than one argument. If you have multiple arguments, you can use the starmap() function instead.
Do not use the map() function if you need control over exception handling for each task, or if you would like to get results to tasks in the order that tasks are completed.
Do not use the map() function if you have many tasks (e.g. hundreds or thousands) as all tasks will be dispatched at once. Instead, consider the more lazy imap() function.
You can learn more about the concurrent version of map() with the ThreadPool in the tutorial:
Use imap_unordered() For Responsive Code
If you would like to handle results in the order that tasks are completed, rather than the order that tasks are submitted, then use imap_unordered() function.
Unlike the map() function, the imap_unordered() function will iterate the provided iterable one item at a time and issue tasks to the ThreadPool.
Unlike the imap() function, the imap_unordered() function will yield return values in the order that tasks are completed, not the order that tasks were issued to the ThreadPool.
This allows the caller to handle results from issued tasks as they become available, making the program more responsive.
For example:
...
# apply a function to each item in the iterable in parallel
for result in pool.imap_unordered(task, items):
# ...
Do not use the imap_unordered() function if you need to handle the results in the order that the tasks were submitted to the ThreadPool, instead, use map() function.
Do not use the imap_unordered() function if you need results from all tasks before continuing on in the program, instead, you may be better off using map_async() and the AsyncResult.wait() function.
Do not use the imap_unordered() function for a simple parallel for-loop, instead, you may be better off using map().
You can learn more about the imap_unordered() function in the tutorial:
Use map_async() to Issue Tasks Asynchronously
If you need to issue many tasks asynchronously, e.g. fire-and-forget use the map_async() function.
The map_async() function does not block while the function is applied to each item in the iterable, instead, it returns an AsyncResult object from which the results may be accessed.
Because map_async() does not block, it allows the caller to continue and retrieve the result when needed.
The caller can choose to call the wait() function on the returned AsyncResult object in order to wait for all of the issued tasks to complete, or call the get() function to wait for the task to complete and access an iterable of return values.
For example:
...
# apply the function
result = map_async(task, items)
# wait for all tasks to complete
result.wait()
Do not use the map_async() function if you want to issue the tasks and then handle the results once all tasks are complete. You would be better off using the map() function.
Do not use the map_async() function if you want to issue tasks one-by-one in a lazy manner in order to conserve memory, instead, use the imap() function.
Do not use the map_async() function if you wish to issue tasks that take multiple arguments, instead use the starmap_async() function.
You can learn more about the map_async() function in the tutorial:
Use Independent Functions as Tasks
Use the ThreadPool if your tasks are independent.
This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.
The ThreadPool is ideal for tasks that do not change any data, e.g. have no side effects, so-called pure functions.
The ThreadPool can be organized into data flows and pipelines for linear dependence between tasks, perhaps with one ThreadPool per task type.
The ThreadPool is not designed for tasks that require coordination, you should consider using the threading.Thread class and coordination patterns like the Barrier and Semaphore.
The ThreadPool is not designed for tasks that require synchronization, you should consider using the threading.Thread class and locking patterns like Lock and RLock.
Use for IO-Bound Tasks
Use ThreadPool for IO-bound tasks only.
These are tasks that may involve interacting with an external device, such as a peripheral (e.g. a camera or a printer), a storage device (e.g. a storage device or a hard drive), or another computer (e.g. socket communication).
Threads and thread pools like the ThreadPool are probably not appropriate for CPU-bound tasks, like computation on data in memory.
This is because of design decisions within the Python interpreter that makes use of a master lock called the Global Interpreter Lock (GIL) that prevents more than one Python instruction from executing at the same time.
This design decision was made within the reference implementation of the Python interpreter (CPython) but may not impact other interpreters (such as PyPy, Iron Python, and Jython).
Takeaways
You now know the best practices when using the ThreadPool in Python.
If you enjoyed this tutorial, you will love my book: Python ThreadPool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.