Last Updated on September 12, 2022
You must consider thread safety when using the ThreadPoolExecutor.
In this tutorial, you will discover how to handle thread-safety when using the ThreadPoolExecutor class in Python.
Let’s get started.
ThreadPoolExecutor Thread-Safety
The ThreadPoolExecutor provides a flexible way to execute ad hoc tasks using a pool of worker threads.
You can submit tasks to the thread pool by calling the submit() function and passing in the name of the function you wish to execute on another thread.
Calling the submit() function will return a Future object that allows you to check on the status of the task and get the result from the task once it completes.
Although the ThreadPoolExecutor uses threads internally, you do not need to work with threads directly in order to execute tasks and get results.
Nevertheless, when accessing resources or critical sections, thread-safety may be a concern.
Run loops using all CPUs, download your FREE book to learn how.
Using the ThreadPoolExecutor Requires Thread-Safety
Using the ThreadPoolExecutor does require some consideration of thread safety.
What Is Thread Safety?
Thread-safety refers to programs that follow good practice when using threads.
Specifically, it refers to protecting critical sections of code from access by multiple threads in parallel.
A critical section in code refers generally to state or resources that will become corrupt or result in unexpected behavior if they are modified or used by multiple threads of execution at the same time.
This could be something as simple as a counter to reading and writing to a file or network connection, to more complex application-specific cases.
How to Make Critical Sections Thread-Safe?
Access to critical sections of code can be made thread safe by using synchronization primitives.
Perhaps the simplest example is the lock, also called mutual exclusion lock or mutex.
A lock must be acquired before a critical section can be executed, after which the lock is then released.
Only one thread at a time can obtain the lock. Any threads that attempt to acquire the lock while another thread possesses it must wait for the lock to become available. This waiting is managed automatically within the lock internally.
For example:
1 2 3 4 5 6 7 8 9 |
... # example of protecting a critical section with a lock lock = Lock() # ... # acquire the lock lock.acquire() # critical section... # release the lock lock.release() |
This shows that execution of a critical section can be made mutually exclusive among threads.
What About the GIL?
Python threads, including those in the ThreadPoolExecutor, are subject to the global interpreter lock (GIL).
The GIL is a programming pattern in the reference Python interpreter (e.g. CPython, the version of Python you download from python.org). It is a lock in the sense that it uses synchronization to ensure that only one thread of execution can execute instructions at a time within a Python process.
This means that although we may have multiple threads in a ThreadPoolExecutor or multiple instances of the Thread class, only instructions from one thread can execute at a time in a Python process.
Even though instructions from only one thread of execution are performed at a time within a Python interpreter, we may still encounter thread safety concerns and therefore must protect critical sections of code.
The reason is that a given line of code may be composed of multiple instructions in the Python interpreter.
As such, a given thread executing a line of code composed of multiple instructions may context switch in the middle of the line, likely leaving the data in an inconsistent state.
You may recall that a context switch involves the operating system halting the execution and storing the state of one thread, and restoring the state and continuing the execution of another thread. This is performed automatically by the operating system and is central to how modern operating systems achieve multitasking, so-called preemptive multitasking.
The most common example of this type of thread safety concern is the incrementing of an integer, written as:
1 2 3 |
... # increment an integer counter += 1 |
This is at least two instructions, though likely more. The first instruction involves reading the value of the counter variable and the second involves adding one to the value.
If a context switch is performed after reading the value but before adding one, then the value of the counter will be inconsistent. This is called a race condition, a specific type of thread-safety concern.
Adding a lock to operations that involve the counter would make the operations “atomic” to concurrent thread execution, and therefore thread safe.
For example:
1 2 3 4 5 6 7 |
... # acquire the lock lock.acquire() # increment an integer counter += 1 # release the lock lock.release() |
ThreadPoolExecutor Thread-Safety
We can see that although the worker threads in the ThreadPoolExecutor are subject to the GIL, we must still be concerned with thread-safety.
The two main examples of where thread-safety may be a concern are:
- Arguments passed to target task functions.
- Access to global state.
Thread-Safe Arguments
Any data that you pass into your target task function must be treated with thread-safety in mind.
This may apply to data such as a list or dictionary that may be passed to all target task functions, each of which in turn may concurrently add or remove items from the collection.
This may also apply to resources such as open connections to a file, database, or remote server that may be read from or written to concurrent my multiple target task functions.
Thread-Safe Global State
Using non-local variables (e.g. global variables) within target task functions executed by a ThreadPoolExecutor is probably a bad idea to begin with.
This is because it breaks encapsulation, making code harder to read and test. Instead, it may be a better design to pass in required data via the function arguments.
Nevertheless, if your target task function makes use of global state, then this may need to be treated with thread-safety in mind.
This might be something application specific, such as a variable or an object, or a shared connection. A common example might be a handle for logging, such as each task logging to standard out or to a file.
Additionally, if you are using a done callback function registered on the Future objects via the add_done_callback() function, then this may be another place that may be using global variables.
Now that we are familiar with thread-safety concerns when using the ThreadPoolExecutor, let’s look at a worked example.
Free Python ThreadPoolExecutor Course
Download your FREE ThreadPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPoolExecutor API.
Discover how to use the ThreadPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.
Example of ThreadPoolExecutor Thread-Safety
Let’s explore an example of using the ThreadPoolExecutor that highlights the need for thread safety.
Example of Thread Unsafe ThreadPoolExecutor
Perhaps the most common example of the need for thread safety when using the ThreadPoolExecutor is in multiple threads accessing the same global state.
We can demonstrate this with an integer at global state that is being updated concurrently by two worker threads.
First, let’s define an integer that holds a balance.
1 2 3 |
... # some global state balance = 0 |
Next, we can define a target task function that adds values to this variable.
1 2 3 4 5 6 |
# target task function that adds to the balance def add_to_balance(): global balance # add to the balance for _ in range(1000000): balance += 100 |
We can then define a separate task function that subtracts values from this variable.
1 2 3 4 5 6 |
# target task function that subtracts from the balance def subtract_to_balance(): global balance # subtract from the balance for _ in range(1000000): balance -= 100 |
Finally, we can create a thread pool with two worker threads and submit one task for each target function.
1 2 3 4 5 6 7 8 |
... # create the thread pool with ThreadPoolExecutor(2) as executor: # task for adding to the balance _ = executor.submit(add_to_balance) # task for subtracting from the balance _ = executor.submit(subtract_to_balance) # wait for all tasks to complete ... |
After both threads are completed, we can report the balance.
We would expect the value to be zero as we have the same number of additions as subtractions.
1 2 3 |
... # check the status of the list print(f'Balance: {balance}') |
Tying this together, the complete example of thread unsafe usage of the ThreadPoolExecutor is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# SuperFastPython.com # example of thread unsafe access to global state with the thread pool from concurrent.futures import ThreadPoolExecutor # target task function that adds to the balance def add_to_balance(): global balance # add to the balance for _ in range(1000000): balance += 100 # target task function that subtracts from the balance def subtract_to_balance(): global balance # subtract from the balance for _ in range(1000000): balance -= 100 # some global state balance = 0 # create the thread pool with ThreadPoolExecutor(2) as executor: # task for adding to the balance _ = executor.submit(add_to_balance) # task for subtracting from the balance _ = executor.submit(subtract_to_balance) # wait for all tasks to complete ... # check the status of the list print(f'Balance: {balance}') |
Running the example creates the thread pool and submits the single addition task and the single subtraction task, then reports the balance that ends.
In this case, the balance is not the zero value we expected. Instead, a race condition occurred between the concurrent updates to the balance variable.
Note: you will get a different value of the balance variable each time you run the code.
1 |
Balance: 43751600 |
Next, let’s look at how we might update the example to be thread safe.
Example of Thread Safe ThreadPoolExecutor
The thread unsafe example from the previous section can be updated to be thread safe with relatively little change.
The first step is to create a threading.Lock instance.
1 2 3 |
... # create the lock lock = Lock() |
We can then pass this lock to both target task functions as an argument, then update the change to the global state, the critical section, to be protected by a lock.
For example:
1 2 3 4 5 6 7 |
... # acquire the lock lock.acquire() # update the balance balance += 100 # release the lock lock.release() |
A simpler and more robust approach is to use the context manager for the lock; for example:
1 2 3 4 |
... # update the balance with lock: balance += 100 |
Tying this together, the complete example of the thread safe version of updating global state from the ThreadPoolExecutor is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# SuperFastPython.com # example of thread safe access to global state with the thread pool from threading import Lock from concurrent.futures import ThreadPoolExecutor # target task function that adds to the balance def add_to_balance(lock): global balance # add to the balance for _ in range(1000000): with lock: balance += 100 # target task function that subtracts from the balance def subtract_to_balance(lock): global balance # subtract from the balance for _ in range(1000000): with lock: balance -= 100 # create the lock lock = Lock() # some global state balance = 0 # create the thread pool with ThreadPoolExecutor(2) as executor: # task for adding to the balance _ = executor.submit(add_to_balance, lock) # task for subtracting from the balance _ = executor.submit(subtract_to_balance, lock) # wait for all tasks to complete ... # check the status of the list print(f'Balance: {balance}') |
Running the example creates the thread pool and executes the two tasks as before.
This time, the final balance will always be zero, given that all code that changes the balance is protected by the lock, ensuring mutually exclusive execution.
1 |
Balance: 0 |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Books
- ThreadPoolExecutor Jump-Start, Jason Brownlee, (my book!)
- Concurrent Futures API Interview Questions
- ThreadPoolExecutor Class API Cheat Sheet
I also recommend specific chapters from the following books:
- Effective Python, Brett Slatkin, 2019.
- See Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPoolExecutor: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to handle thread safety when working with the ThreadPoolExecutor.
Do you have any questions about
ThreadPoolExecutor thread-safety?
Ask your questions in the comments below and I will do my best to answer.
Join the discussion on reddit.
Photo by Coen van de Broek on Unsplash
Do you have any questions?