Thread-Safe Dictionary in Python

Last Updated on September 12, 2022

You can make a Python dictionary thread-safe by using a mutual exclusion (mutex) lock via the threading.Lock class.

In this tutorial you will discover how to develop a thread-safe dictionary in Python.

Let’s get started.

Table of Contents

Need a Thread-Safe Dictionary

A thread is a thread of execution in a computer program.

Every Python program has at least one thread of execution called the main thread. Both processes and threads are created and managed by the underlying operating system.

Sometimes we may need to create additional threads in our program in order to execute code concurrently.

Python provides the ability to create and manage new threads via the threading module and the threading.Thread class.

You can learn more about Python threads in the guide:

Threading in Python: The Complete Guide

In concurrent programming we may need to share a dict data structure between threads.

Multiple threads may need to add data to the same dict, other threads may wish to remove items or check the length of the dict.

Is the dict thread-safe in Python and if not, how can we make it thread-safe?

Run loops using all CPUs, download your FREE book to learn how.

Most Dictionary Operations Are Atomic

Many common operations on a dict are atomic, meaning that they are thread-safe.

Recall, a dict is a mapping of keys to values. A dict can be created via the dict() statement, for example:

...

# create a new dict

d = dict()

We can also create a new dictionary by specifying a mapping of keys to values in-line, for example:

...

# create a new dict

d = {a:1, b:2, c:3}

Atomic means that the operation either occurs or does not occur with no in between inconsistent state.

Operations such as adding, removing, and reading a value on a dict are atomic.

In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are.
— What kinds of global value mutation are thread-safe?

Specifically:

Adding a key and value mapping.
Replacing a value for a key.
Adding a dict to a dict via update().
Getting a list of keys via keys().

That is, operations on the dictionary that involve a single operation are atomic, for the most part.

The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access.
— global interpreter lock, Python Glossary.

You can learn about atomic operations in Python here:

Thread Atomic Operation in Python

Because atomic operations either occur or do not occur, it means that they are thread-safe.

Specifically, using any of the above operations on a dict shared between multiple threads will not result in a race condition, corruption of the dict or corruption of the data within the dict.

Next, let’s consider why relying on atomic dict operations might be fragile.

Download Now: Free Threading PDF Cheat Sheet

Atomic Dictionary Operations Are Fragile

Common operations on a dict are atomic and therefore thread safe as we saw in the previous section.

This is only true at the time of writing because of a few specific considerations, such as:

The precise details on how these dict operations are converted to python virtual machine bytecode.
The use of the reference python interpreter.
The use of a Global Interpreter Lock (GIL) within the reference python interpreter.

This means that depending on the thread-safety of these operations could be fragile in future versions of Python or when executing your Python program with alternate Python interpreters.

This is not a minor concern.

There are frequent development efforts to improve the Python interpreter and even attempts to remove the GIL. These will likely change the specifics of the Python VM, bytecode compiling and thread-safety of built-in data structures.

It is also becoming more common to run Python code using third-party interpreters, mostly to achieve better performance. Alternate interpreters may or may not implement the same rules for atomic operations on dictionaries.

Therefore, we may desire a thread-safe Python dict that is future-proof to changes to Python interpreters and the GIL.

When in doubt, use a mutex!
— What kinds of global value mutation are thread-safe?

Next, let’s look at some operations on the dict that are not thread-safe.

Free Python Threading Course

Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.

Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores

Learn more

Dictionary Race Conditions Are Possible

Operations performed on a dict are atomic, as we have seen above.

These include adding and removing items, and getting views on the dict for iterating, like lists of keys and lists of values.

Nevertheless, you may still get race conditions when using a dict.

This is true, even though:

Operations on the dict are thread-safe.
The GIL prevents more than one thread updating state in the Python interpreter.

A main source of race conditions when working with a dict is in performing operations that involve two or more steps.

A classical example is a task you may perform on your dict where you first get a key from the dict, then use the key in some operation on the dict, like getting or removing the entry.

Thread A: Get a key from dict.
Thread A: Use key on dict.

A context switch is possible between these two operations, allowing another thread to remove the entry from the dict in between. Once your thread is resumed, the key no longer exists on the dict and you get a KeyError or similar failure case.

For example:

Thread A: Get a key from dict.
<context switch>
Thread B: Get same key from dict.
Thread B: Remove key from dict.
<context switch>
Thread A: Use key on dict.

But, what if you add a check if the key exists before getting the value?

For example:

Thread A: Get a key from dict.
Thread A: Check that the key exists in dict.
Thread A: Use key on dict.

Nope, you still have a race condition. This time between checking that the key exists and using the key.

For example:

Thread A: Get a key from dict.
Thread A: Check that the key exists in dict.
<context switch>
Thread B: Get same key from dict.
Thread B: Remove key from dict.
<context switch>
Thread A: Use key on dict.

Similar race conditions can happen if you have two-step operations that get and use values from the dict instead of keys.

Any operations on the dict that involve two steps probably should be treated as critical sections and be protected from race conditions.

Next, let’s look at some examples.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Example Adding to a Dict is Thread-Safe

Adding items to a dict from multiple threads is thread-safe.

We can explore this with a worked example.

In this example we will create 1,000 threads, each of which will concurrently add 1,000 unique items to a shared dictionary. The result will be a dictionary with one million items as all discrete add operations are performed without a race condition.

If adding items to the dict was not thread-safe, then the internal array for holding keys and values may become inconsistent and we would expect fewer than the intended number of items added.

First, we can define a function to be executed by worker threads to add items to the shared dictionary.

The function will take the shared dictionary as well as a start integer value and the number of values to add as arguments. It will then iterate from the start value for the specified number of values and add these integers as keys and values to the shared dictionary.

The add_items() function below implements this.

# add a range of values to the dictionary

def add_items(shared_dict, start_value, num_values):

# enumerate block of values

for i in range(start_value, start_value+num_values):

# add to the dict

shared_dict[i] = i

Next, in the main thread we can create a shared dictionary instance and report that it is empty.

...

# create a dictionary with 1 million items

shared_dict = dict()

print(f'Dict has {len(shared_dict)} items')

We can then configure 1,000 threads to call our add_items() function with a shared dictionary, a start value and to add 1,000 values each.

Having 1,000 threads each adding 1,000 unique key-value pairs to the shared dictionary will result in 1,000,000 items in the dict after all threads are finished.

...

# configure threads

threads = list()

for i in range(0, 1000000, 1000):

thread = Thread(target=add_items, args=(shared_dict, i, 1000))

threads.append(thread)

Next, we can start all threads and then wait for all threads to complete.

...

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

Finally, the main thread will report the number of items in the shared dictionary.

1 2	... print(f'Dict has {len(shared_dict)} items')

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of thread-safe adding items to a shared dict

from threading import Thread

# add a range of values to the dictionary

def add_items(shared_dict, start_value, num_values):

# enumerate block of values

for i in range(start_value, start_value+num_values):

# add to the dict

shared_dict[i] = i

# create a dictionary with 1 million items

shared_dict = dict()

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = list()

for i in range(0, 1000000, 1000):

thread = Thread(target=add_items, args=(shared_dict, i, 1000))

threads.append(thread)

print(f'Created {len(threads)} threads')

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example first creates the shared dictionary.

It then configures 1,000 threads, each instructed to call the add_items() function and add 1,000 unique values to the shared dictionary at the same time.

The threads are started and the main thread waits for all threads to terminate.

Finally, the main thread reports that as expected all one million unique items were added to the dictionary concurrently without incident.

Importantly, this occurs every time the program is run, meaning it is consistent and there is no race condition.

Dict has 0 items

Created 1000 threads

Dict has 1000000 items

Next, let’s look at updating a dict concurrently from multiple threads.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Example Updating a Dict is Thread-Safe

We can update the previous example so that each thread updates the dictionary.

Updating a dictionary means adding one dictionary of key-value pairs to another dictionary, and having any overlapping keys replaced if needed. This can be achieved by the update() function.

The update() function on a dictionary is thread-safe, and we can explore this with a worked example.

This can be achieved by updating our add_items() function to first create a new dictionary of key-value pairs, then adding this to the shared dictionary.

# add a range of values to the dictionary

def add_items(shared_dict, start_value, num_values):

# create dict

new_dict = {i:i for i in range(start_value, start_value+num_values)}

# add new dict to shared dict

shared_dict.update(new_dict)

And that’s it.

Now all 1,000 threads will attempt to update the shared dictionary concurrently.

If the update operation is not thread-safe then, this operation may result in corruption of the underlying arrays within the dictionary and likely result in fewer than the expected 1,000,000 items in the final structure.

Tying this together, the complete example of concurrently updating a dictionary is listed below.

# SuperFastPython.com

# example of thread-safe updating a shared dictionary

from threading import Thread

# add a range of values to the dictionary

def add_items(shared_dict, start_value, num_values):

# create dict

new_dict = {i:i for i in range(start_value, start_value+num_values)}

# add new dict to shared dict

shared_dict.update(new_dict)

# create a dictionary with 1 million items

shared_dict = dict()

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = list()

for i in range(0, 1000000, 1000):

thread = Thread(target=add_items, args=(shared_dict, i, 1000))

threads.append(thread)

print(f'Created {len(threads)} threads')

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example first creates the shared dictionary, then creates and configures 1,000 threads to update the shared dictionary.

The main thread then waits for the new threads to terminate.

Each thread creates a new dictionary with 1,000 items, then adds this dictionary to the shared dictionary.

Finally, the threads finish and the main thread correctly reports that the shared dictionary contains 1,000,000 unique key-value pairs, as expected.

Dict has 0 items

Created 1000 threads

Dict has 1000000 items

Next, let’s look at an example of an operation using a dictionary that is not thread-safe.

Example Removing Items Can Be Thread-Unsafe

We can explore how getting and then using a key on a dictionary is not thread safe.

In this example we will first populate a dictionary with one million items. We will then create 1,000 threads, each which will attempt to remove 1,000 items from the dict. Each thread will iterate keys and remove items. This fails with an error as the operation is not thread-safe.

Firstly, we can define a function to be executed by worker threads.

The function will take the shared dictionary as an argument and the number of items to remove, defaulted to a value of 1,000. The function will then iterate over keys in the dictionary and remove items by calling pop(). Once the specified number of items has been removed, the loop will break and the function will return.

The remove_items() function below implements this.

# remove some items from the shared dictionary

def remove_items(shared_dict, limit=1000):

counter = 0

for key in list(shared_dict.keys()):

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

Note, we are making a list from the view on the keys from the shared dictionary. This results in each thread having its own copy of the keys present in the dictionary.

Next, in the main thread we can first create a shared dictionary populated with one million items.

...

# create a dictionary with 1 million items

shared_dict = {i:i for i in range(1000000)}

print(f'Dict has {len(shared_dict)} items')

We can then create and configure 1,000 threads to call the remove_items function and pass in the shared dictionary as an argument.

...

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,)) for _ in range(1000)]

The threads can then be started and the main thread can wait for the threads to terminate.

Recall that a thread will terminate when it exits the called function normally, or if an error or exception is raised and not handled. We are expecting the latter to occur in this case.

...

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

Finally, the main thread will report the number of values in the dict, which ideally would be zero after 1,000 threads each remove 1,000 items each.

1 2	... print(f'Dict has {len(shared_dict)} items')

In this case we expect most threads to fail with a KeyError due to the race condition. Admittedly, the race condition is contrived, but illustrative of a broader set of race conditions that will occur when operating on dictionaries from multiple threads.

Tying this together, the complete example of how removing items from the dictionary can be thread-unsafe is listed below.

# SuperFastPython.com

# example of a race condition with a dict

from threading import Thread

# remove some items from the shared dictionary

def remove_items(shared_dict, limit=1000):

counter = 0

for key in list(shared_dict.keys()):

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

# create a dictionary with 1 million items

shared_dict = {i:i for i in range(1000000)}

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,)) for _ in range(1000)]

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example first populates the shared dictionary.

A total of 1,000 threads are then configured, then started. The main thread then blocks until all threads terminate.

Each thread gets a snapshot list of keys in the dictionary, then enumerates them trying to remove 1,000 items.

Nearly all threads fail with a KeyError, caused by a race condition.

Specifically, the threads are context switched after getting the list of keys but before removing a key on a given iteration of the loop. Another thread proceeded to remove the same key. Then when the first thread is resumed, the key no longer exists and calling pop() results in a KeyError and causes the thread to terminate.

A truncated example of the output is listed below showing the failure case.

Dict has 1000000 items

Exception in thread Thread-1:

Traceback (most recent call last):

...

Exception in thread Exception in thread Thread-3:

Traceback (most recent call last):

...

Exception in thread Exception in thread Thread-5:

Traceback (most recent call last):

...

Thread-4:

Traceback (most recent call last):

...

Exception in thread Exception in thread Exception in thread Thread-8:

....

KeyError: 177364

shared_dict.pop(key)

KeyError: 181364

Dict has 27636 items

What if we check if the key exists before removing it from the dictionary?

A reasonable question.

This too results in a race condition.

For example, we can update the remove_items() function so that each iteration it checks whether the key exists in the dict before attempting to remove it.

...

if key in shared_dict:

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

The updated version of the function is listed below.

# remove some items from the shared dictionary

def remove_items(shared_dict, limit=1000):

counter = 0

for key in list(shared_dict.keys()):

if key in shared_dict:

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

Tying this together, the complete example of checking for the presence of the key prior to removing it from the dict is listed below.

# SuperFastPython.com

# example of a race condition with a dict

from threading import Thread

# remove some items from the shared dictionary

def remove_items(shared_dict, limit=1000):

counter = 0

for key in list(shared_dict.keys()):

if key in shared_dict:

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

# create a dictionary with 1 million items

shared_dict = {i:i for i in range(1000000)}

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,)) for _ in range(1000)]

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example creates the shared dict as before, then starts and runs the worker threads.

Each worker thread iterates its own copy of the list of keys from the dict. Each key is then checked to see if it exists within the dict before being removed.

This too results in a race condition.

Specifically, threads are context switched after passing the if-condition to see if the key exists, but before the key is removed from the dict.

The result is the same as the previous example. Most threads fail with a KeyError.

A truncated example of the output is listed below showing the failure case.

Dict has 1000000 items

Exception in thread Thread-105:

Traceback (most recent call last):

...

Exception in thread Thread-110:

Traceback (most recent call last):

...

KeyError: 104480

shared_dict.pop(key)

...

KeyError: 104480

shared_dict.pop(key)

KeyError: 109248

Dict has 752 items

Example Removing Dictionary Item Thread-Safe

We can fix the race condition in the previous section by using a mutual exclusion (mutex) lock.

Specifically, we can create a threading.Lock instance and share it between the worker threads.

Prior to each thread getting a list of keys to remove, it must acquire the lock. This ensures that only one thread is able to operate upon the dict at a time.

If you are new to mutex locks, you can learn more about them in this tutorial:

How to Use a Mutex Lock in Python

For example, we can update the remove_items() function to receive a lock as an argument and then acquire the lock prior to operating on the dict.

# remove some items from the shared dictionary

def remove_items(shared_dict, lock,limit=1000):

counter = 0

# acquire the lock

with lock:

for key in list(shared_dict.keys()):

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

In the main thread we can first create the lock instance.

...

# create the shared lock

lock = Lock()

We can then share it with each worker thread.

...

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,lock)) for _ in range(1000)]

Tying this together, the complete example of the thread-safe way of iterating and removing items from the dictionary is listed below.

# SuperFastPython.com

# example of thread-safe removing items from the dictionary

from threading import Thread

from threading import Lock

# remove some items from the shared dictionary

def remove_items(shared_dict, lock,limit=1000):

counter = 0

# acquire the lock

with lock:

for key in list(shared_dict.keys()):

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

# create the shared lock

lock = Lock()

# create a dictionary with 1 million items

shared_dict = {i:i for i in range(1000000)}

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,lock)) for _ in range(1000)]

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example first creates the shared lock and shared dictionary.

The worker threads are configured and started. Each thread first acquires the lock before iterating and removing 1,000 keys from the dictionary.

If the lock is already acquired, other threads are blocked and must wait until it is available.

The result is that all one million key-item pairs are removed from the dictionary in a thread-safe manner.

Importantly, the same result is achieved every time the code is run.

1 2	Dict has 1000000 items Dict has 0 items

An alternate approach is to acquire the lock each iteration of the loop when removing keys.

Once the lock is acquired, the thread can check if the key exists and then remove it, otherwise skip the key.

For example, the updated version of the remove_items() function with these changes is listed below.

# remove some items from the shared dictionary

def remove_items(shared_dict, lock,limit=1000):

counter = 0

for key in list(shared_dict.keys()):

# acquire the lock

with lock:

if key in shared_dict:

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

This achieves the same outcome, but may allow other threads to run while threads are context switched by the operating system as the lock is only held by a thread for a brief period, rather than 1,000 iterations as above.

Tying this together, the complete example of the alternate approach to remove batches of items from the dict in a thread-safe manner is listed below.

# SuperFastPython.com

# example of thread-safe removing items from the dictionary

from threading import Thread

from threading import Lock

# remove some items from the shared dictionary

def remove_items(shared_dict, lock,limit=1000):

counter = 0

for key in list(shared_dict.keys()):

# acquire the lock

with lock:

if key in shared_dict:

shared_dict.pop(key)

counter += 1

if counter >= limit:

break

# create the shared lock

lock = Lock()

# create a dictionary with 1 million items

shared_dict = {i:i for i in range(1000000)}

print(f'Dict has {len(shared_dict)} items')

# configure threads

threads = [Thread(target=remove_items, args=(shared_dict,lock)) for _ in range(1000)]

# start threads

for thread in threads:

thread.start()

# wait for threads to finish

for thread in threads:

thread.join()

print(f'Dict has {len(shared_dict)} items')

Running the example creates the shared lock and dictionary as before.

Threads are configured and started. Each thread then iterates a list of keys and removes 1,000 items.

The lock ensures that the dictionary is not changed between checking if the key is present and removing it from the dict.

The result is that all items are removed from the dictionary successfully. Importantly, this result is achieved every time the code is run.

Acquiring and releasing the lock so often in each thread is computationally expensive. As such, this version of the code is dramatically slower than the previous version.

For example, I killed the process after many minutes, whereas the previous version finished after a few seconds.

1 2	Dict has 1000000 items Dict has 0 items

Takeaways

You now know how to use a thread-safe dict in Python.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Harley-Davidson on Unsplash

Comments

Thomas says

September 1, 2022 at 8:00 am

hi,

Python beginner here.

Thanks for all those posts about python and threading.

For a script that want to work on a dict, with each thread working on only 1 different key/value pair of the dict to update those values. One could think that there is no race conditions risks . But as you stated it is never sure because of futures changes in python rules or different behavior of interpreters.
If we want to avoid the use of a lock in order to prioritize execution speed : what do you think of the use dynamic variable names instead of a dict to create unique variables, one for each thread, to avoid race conditions with a dict ? Each unique variable would point to a a unique copy of a value of the dict. (A copy to avoid to point to the same object to prevent conflicts or maybe a hidden race conditions if it is the same dict that variables are associated with)

- Jason Brownlee says
  
  September 1, 2022 at 8:14 am
  
  Hi Thomas, good question.
  
  Yes, generally, you can hammer the same dict from multiple threads and everything will be fine because of the GIL.
  
  In practice, it is a good idea to protect the code, e.g. make it future proof, if it is expected to be used for a long time or perhaps when you want to switch from threads to processes for performance.
  
  One solution is to protect the dict with a mutex.
  
  Another is to send changes to the dict to one worker thread that is responsible for making changes.
  
  Yet another is for each thread to maintain a local copy, as you suggest, it can then send the local copy to some other thread at the end and merge all changes into a single structure.
  
  Perhaps you can try a few approaches and discover what is the best fit for your specific application.
  
  Let me know how you go.
  
cheoljoo.lee says

March 24, 2023 at 11:11 pm

hi.

i appreciate on your writing. i can learn a lot from your article.

i suggest that you can increase the performance if you use separate lock per key.

i have question about dictionary update.
i want to add this count. each thread can add count variable in any key (‘a’ or ‘b’)
dic = { ‘a’ : { ‘count’:1 } ,
‘b’ : {‘count’: 2}
}
in this case , although i got the lock for key , i can not find the right result for count.

in advance thanks.

Have a good day!

- Jason Brownlee says
  
  March 25, 2023 at 6:43 am
  
  Thank you.
  
  Perhaps you can use a thread-safe counter for each thread:
  https://superfastpython.com/thread-safe-counter-in-python/

Thread-Safe Dictionary in Python

Need a Thread-Safe Dictionary

Most Dictionary Operations Are Atomic

Atomic Dictionary Operations Are Fragile

Dictionary Race Conditions Are Possible

Example Adding to a Dict is Thread-Safe

Example Updating a Dict is Thread-Safe

Example Removing Items Can Be Thread-Unsafe

Example Removing Dictionary Item Thread-Safe

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Threading Resources:

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Threading Systematically

Additional menu

Need a Thread-Safe Dictionary

Most Dictionary Operations Are Atomic

Atomic Dictionary Operations Are Fragile

Dictionary Race Conditions Are Possible

Example Adding to a Dict is Thread-Safe

Example Updating a Dict is Thread-Safe

Example Removing Items Can Be Thread-Unsafe

Example Removing Dictionary Item Thread-Safe

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

Learn Threading Systematically