Last Updated on September 12, 2022
You can use lock striping to reduce lock contention on a data structure like a dictionary or a list.
In this tutorial you will discover how to use lock striping in Python.
Let’s get started.
Problem of Lock Contention
Lock contention is a problem that may occur when using mutual exclusion (mutex) locks.
Lock contention is typically described in the context of the internals of operating systems, but may also be a problem at a high-level such as application programming with concurrency primitives.
It describes a situation where a lock protects a critical section and multiple threads attempt to acquire the lock at the same time.
This is a problem because only one thread can acquire a lock, and all other threads must block, and wait for the lock to become available. This is an inefficient use of resources, as those threads could be executing and completing tasks in the application.
You can learn more about lock contention in this tutorial:
One approach to reducing contention is called lock striping.
What is lock striping and how can we implement this in Python?
Run loops using all CPUs, download your FREE book to learn how.
What is Lock Striping
Lock striping is a technique for reducing lock contention.
It is used when a lock is used to protect a data structure such as a list or hash map.
The lock protecting the data structure is split into multiple locks, each protecting a subset of the data in the data structure.
Lock splitting can sometimes be extended to partition locking on a variable-sized set of independent objects, in which case it is called lock striping.
— Page 237, Java Concurrency in Practice, 2006.
For example, if a list was protected by a lock, then with lock striping, five locks could be used to protect the data within the list, each responsible for protecting one-fifth of the data from race conditions.
… the implementation of [a hash map with lock striping] uses an array of 16 locks, each of which guards 1/16 of the hash buckets; bucket N is guarded by lock N mod 16.
— Page 237, Java Concurrency in Practice, 2006.
As such, each lock protects a “stripe” of data in the data structure.
In turn, lock contention is reduced to about 1/N, where N is the number of locks used, if use of array indexes the data structure by the program is uniformly distributed.
The downside of lock striping is that if the data structure itself requires a change, such as adding or removing items, then all locks protecting the data in the structure must be acquired before making a change. This can be more costly than using a single lock.
One of the downsides of lock striping is that locking the collection for exclusive access is more difficult and costly than with a single lock.
— Page 237, Java Concurrency in Practice, 2006.
Now that we know what lock striping is, let’s look at how we might implement it in Python.
How to Implement Lock Striping
Lock striping can be implemented using a list of locks.
A list of locks can be created and used to protect a data structure that is accessed using an array index, such as a list. It could be used for a dict, but may require a custom implementation.
Firstly, a list of locks can be created.
The number of locks in the list is arbitrary. The more locks in the list, the less contention there will be for the data being protected, although the more memory required. At a worst case, you could have one lock per item in the data structure that is being protected.
For example, we may choose 5 locks:
1 2 3 |
... # create shared locks locks = [Lock() for _ in range(5)] |
A lock can then be chosen before interacting or updating the data structure.
One approach would be to assign each lock to a subset of the data structure directly.
If we had five locks, then each lock could protect 1/5 (one fifth) of the data in the structure. If the structure was a list with 100 items, then:
- Lock 0 could protect indexes in the list from 0 to 19.
- Lock 1 could protect indexes in the list from 20 to 39.
- Lock 2 could protect indexes in the list from 40 to 59.
- Lock 3 could protect indexes in the list from 60 to 79.
- Lock 4 could protect indexes in the list from 80 to 99.
This would require a few if-statements.
There’s no need for each lock to protect contiguous blocks of array indexes.
A simpler approach that is fast to calculate and less code is to use the modulo of the array index to the number of locks.
For example:
- Lock_Index = Array_Index % Number_of_Locks
Recall, % is the modulo operator in Python.
The % (modulo) operator yields the remainder from the division of the first argument by the second.
— 6.6. Binary arithmetic operations
This will consistently map arbitrary array indexes to the same lock, although it requires that you know the array index you wish to interact with beforehand.
In code, this would look as follows:
1 2 3 4 5 6 7 8 |
... # decide where to operate on the list index = ... # map the array index to a lock index lock_index = index % len(locks) # acquire the lock with locks[lock_index]: # ... |
With a list with a length of 100 and five locks, each lock would still protect about 1/5 (one fifth) of the array indexes, but they would be spread across the array instead of contiguous.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Array Index Lock Index 0 0 1 1 2 2 3 3 4 4 5 0 6 1 7 2 8 3 9 4 10 0 ... |
If the data structure itself requires changing such as adding or deleting an item, then all locks must be acquired before the structure can be changed.
For example:
1 2 3 4 5 6 |
... # acquire all locks for lock in locks: lock.acquire() # change the data structure # ... |
Now that we know how to implement lock striping, let’s look at a worked example.
Free Python Threading Course
Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.
Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores
Example of Lock Contention
Before we develop an example of lock striping, let’s develop an example of a lock protecting a data structure that suffers high contention.
In this example, we will have a list with 20 items protected by a single lock. We will then have 20 threads that will attempt to acquire the lock, choose a random index in the array to modify, block for a moment, update the list at the chosen index and report progress.
The single lock will suffer high contention as all 20 threads will all compete to acquire it in order to modify the list.
First, we can define a function to operate on our list in a new thread.
The function takes a unique integer identifier for each thread, the shared lock and the shared data structure as arguments.
1 2 3 |
# operate on the list def task(identifier, lock, data_list): # ... |
Firstly, it attempts to acquire the shared lock protecting the list.
1 2 3 4 |
... # acquire the lock with lock: # ... |
Next, it randomly chooses an array index at which to operate between 0 and the last item in the list.
1 2 3 |
... # decide where to operate on the list index = randint(0, len(data_list)-1) |
The thread then simulates computational effort by blocking for a random fraction of a second.
1 2 3 |
... # block for a moment sleep(random()) |
Finally, the list is updated at the chosen array index by incrementing the value, then the value is reported along with the chosen index and unique thread identifier.
1 2 3 4 5 |
... # update the list at that point data_list[index] += 1 # report print(f'.thread {identifier} updated index {index} to {data_list[index]}') |
Tying this together, the task() function implementing this is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 |
# operate on the list def task(identifier, lock, data_list): # acquire the lock with lock: # decide where to operate on the list index = randint(0, len(data_list)-1) # block for a moment sleep(random()) # update the list at that point data_list[index] += 1 # report print(f'.thread {identifier} updated index {index} to {data_list[index]}') |
Next, in the main thread, we will create the shared lock and the shared list data structure initialized with a length of 20 and with zero values.
1 2 3 4 5 |
... # create shared lock lock = Lock() # create shared data data_list = [0 for _ in range(20)] |
We will then create 20 threads configured to execute our task() function, each assigned a unique integer identifier. This can be achieved in a list comprehension.
1 2 3 |
... # create threads threads = [Thread(target=task, args=(i, lock, data_list)) for i in range(20)] |
The main thread can then start these new threads, then block until they terminate.
1 2 3 4 5 6 7 |
... # start the threads for thread in threads: thread.start() # wait for threads to terminate for thread in threads: thread.join() |
Finally, the status of the shared list can be reported.
1 2 3 |
... # report the data print(data_list) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# SuperFastPython.com # example of lock contention for a list from random import randint from random import random from time import sleep from threading import Thread from threading import Lock # operate on the list def task(identifier, lock, data_list): # acquire the lock with lock: # decide where to operate on the list index = randint(0, len(data_list)-1) # block for a moment sleep(random()) # update the list at that point data_list[index] += 1 # report print(f'.thread {identifier} updated index {index} to {data_list[index]}') # create shared lock lock = Lock() # create shared data data_list = [0 for _ in range(20)] # create threads threads = [Thread(target=task, args=(i, lock, data_list)) for i in range(20)] # start the threads for thread in threads: thread.start() # wait for threads to terminate for thread in threads: thread.join() # report the data print(data_list) |
Running the example, first creates the shared lock and initializes the shared data structure.
Next, the 20 new threads are configured and started and the main thread blocks until the threads terminate.
Each thread attempts to acquire the lock protecting the list. Only one thread can acquire the lock at a time and the rest must wait, resulting in high lock contention.
Once acquired, the thread will randomly choose an array index at which to operate, block for a moment, update the list and report the updated value and other details.
A sample of output from the program is listed below. Your specific results will differ given the use of random numbers.
The example takes about 10 seconds to complete on average, due in large part to the high contention for the single lock.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
.thread 0 updated index 17 to 1 .thread 1 updated index 12 to 1 .thread 2 updated index 12 to 2 .thread 3 updated index 5 to 1 .thread 4 updated index 12 to 3 .thread 5 updated index 2 to 1 .thread 6 updated index 13 to 1 .thread 7 updated index 5 to 2 .thread 8 updated index 13 to 2 .thread 9 updated index 12 to 4 .thread 10 updated index 10 to 1 .thread 11 updated index 10 to 2 .thread 12 updated index 12 to 5 .thread 13 updated index 2 to 2 .thread 14 updated index 11 to 1 .thread 15 updated index 4 to 1 .thread 16 updated index 12 to 6 .thread 17 updated index 13 to 3 .thread 18 updated index 9 to 1 .thread 19 updated index 8 to 1 [0, 0, 2, 0, 1, 2, 0, 0, 1, 1, 2, 1, 6, 3, 0, 0, 0, 1, 0, 0] |
Next, let’s look at how we can reduce lock contention in this example using lock striping.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Lock Striping
Lock striping can be used to reduce lock contention for items in a list.
This can be achieved by creating an array of locks, each responsible for protecting a subset of the data structure. Each thread must first choose the area of the data structure on which it will operate, this is then consistently mapped to an appropriate lock, the lock is acquired and the thread can complete the task.
This dramatically reduces lock contention as a function of the number of locks chosen to protect the data structure.
We can demonstrate this lock striping approach by updating the example from the previous section.
First, the task() function must be updated to take a list of locks as an argument instead of a single lock.
1 2 3 4 |
... # operate on the list def task(identifier, locks, data_list): # ... |
Next, the thread must choose the area of the array on which it intends to operate, in this case choose a random array index.
1 2 3 |
... # decide where to operate on the list index = randint(0, len(data_list)-1) |
This is then mapped consistently to a lock using the modulo of the number of locks.
1 2 3 |
... # map the array index to a lock index lock_index = index % len(locks) |
The lock can then be acquired and the thread can complete the task as before.
1 2 3 4 |
... # acquire the lock with locks[lock_index]: # ... |
The updated version of the task() function with these changes is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# operate on the list def task(identifier, locks, data_list): # decide where to operate on the list index = randint(0, len(data_list)-1) # map the array index to a lock index lock_index = index % len(locks) # acquire the lock with locks[lock_index]: # block for a moment sleep(random()) # update the list at that point data_list[index] += 1 # report print(f'.thread {identifier} updated index {index} to {data_list[index]}') |
The main thread can then create a list of locks to protect the data structure.
In this case, we will use five locks to protect the 20 items in the list data structure, each lock protecting 4 array indexes.
1 2 3 |
... # create shared locks locks = [Lock() for _ in range(5)] |
The locks are then passed to the task() function as each thread is created and configured.
1 2 3 |
... # create threads threads = [Thread(target=task, args=(i, locks, data_list)) for i in range(20)] |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# SuperFastPython.com # example of lock striping to reduce lock contention for a list from random import randint from random import random from time import sleep from threading import Thread from threading import Lock # operate on the list def task(identifier, locks, data_list): # decide where to operate on the list index = randint(0, len(data_list)-1) # map the array index to a lock index lock_index = index % len(locks) # acquire the lock with locks[lock_index]: # block for a moment sleep(random()) # update the list at that point data_list[index] += 1 # report print(f'.thread {identifier} updated index {index} to {data_list[index]}') # create shared locks locks = [Lock() for _ in range(5)] # create shared data data_list = [0 for _ in range(20)] # create threads threads = [Thread(target=task, args=(i, locks, data_list)) for i in range(20)] # start the threads for thread in threads: thread.start() # wait for threads to terminate for thread in threads: thread.join() # report the data print(data_list) |
Running the example, first creates the shared lock and initializes the shared data structure.
Next, the 20 new threads are configured and started and the main thread blocks until the threads terminate.
Each thread chooses an area of the array to work with, maps this to a lock, then attempts to acquire the lock, then completes its task.
Contention is reduced from 20 threads fighting over one lock to 20 threads fighting over five locks. As such, lock contention is reduced by about 1/5 (one fifth) on average.
The example takes about 3.5 seconds to complete on my system on average, down from over ten seconds when all threads were competing for one lock.
A sample of output from the program is listed below. Your specific results will differ given the use of random numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
.thread 8 updated index 1 to 1 .thread 14 updated index 1 to 2 .thread 2 updated index 13 to 1 .thread 3 updated index 15 to 1 .thread 0 updated index 7 to 1 .thread 1 updated index 9 to 1 .thread 5 updated index 19 to 1 .thread 11 updated index 7 to 2 .thread 16 updated index 11 to 1 .thread 6 updated index 10 to 1 .thread 17 updated index 12 to 1 .thread 18 updated index 11 to 2 .thread 4 updated index 13 to 2 .thread 19 updated index 16 to 1 .thread 10 updated index 10 to 2 .thread 15 updated index 15 to 2 .thread 12 updated index 14 to 1 .thread 7 updated index 3 to 1 .thread 9 updated index 8 to 1 .thread 13 updated index 18 to 1 [0, 2, 0, 1, 0, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 0, 1, 1] |
Further Reading
This section provides additional resources that you may find helpful.
Python Threading Books
- Python Threading Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- Threading Module API Cheat Sheet
I also recommend specific chapters in the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Threading: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to use lock striping to reduce lock contention.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Do you have any questions?