Last Updated on September 12, 2022
The lock overhead can be about 3.19 times to 4.40 times slower than not using a lock.
In this tutorial you will discover how to calculate lock overhead in Python.
Let’s get started.
Problem of Lock Overhead
A mutual exclusion lock or mutex lock is a synchronization primitive intended to prevent a race condition.
A mutex lock can be used to ensure that only one thread at a time executes a critical section of code at a time, while all other threads trying to execute the same code must wait until the currently executing thread is finished with the critical section and releases the lock.
Python provides a mutual exclusion lock via the threading.Lock class.
An instance of the lock can be created and then acquired by threads before accessing a critical section, and released after the critical section.
For example:
1 2 3 4 5 6 7 8 9 |
... # create a lock lock = Lock() # acquire the lock lock.acquire() # critical section # ... # release the lock lock.release() |
Only one thread can have the lock at any time. If a thread does not release an acquired lock, it cannot be acquired again.
The thread attempting to acquire the lock will block until the lock is acquired, such as if another thread currently holds the lock then releases it.
We can also use the lock via the context manager. This will allow the critical section to be a block within the usage of the lock and for the lock to be released automatically once the block has completed.
For example:
1 2 3 4 5 6 |
... # create a lock lock = Lock() # acquire the lock with lock: # ... |
You can learn more about mutex locks in the tutorial:
A problem with using a mutex lock is that it adds a computational overhead.
That is, using a lock can make code slower than not using the lock.
This is referred to as lock overhead, or the computational overhead for protecting a critical section using a mutex lock.
What is the computational overhead for using a lock in Python?
Run loops using all CPUs, download your FREE book to learn how.
How to Calculate Lock Overhead
The computational overhead for using a lock can be calculated by comparing the same code with and without a mutex lock.
This requires a baseline task with a critical section executed in a loop without a lock. The time to complete this task can be determined and used as a point of reference.
- Baseline: Time taken to execute a baseline task without a lock.
The baseline task can then be updated so that the critical section is protected by a mutex lock each iteration.
- Protected: Time taken to execute the baseline task protected by a lock.
The protected version of the task can then be timed and the result compared to the baseline.
The difference between timings will provide an indication of the overhead for using a mutex lock.
That is, the lock overhead can be calculated as the time taken for the protected version of the task minus the time taken for the baseline version of the task.
- Lock Overhead = Protected – Baseline
Next, let’s define a baseline task.
Baseline Without Lock
In this section, we will define a baseline program used as a point of reference for calculating the lock overhead.
The task will involve iterating a loop a large number of times (100 million) and incrementing a variable. We will then time the execution of this task three times and report the average time taken.
Firstly, we can define the task function to be executed in a new thread. The task will take the mutex lock as an argument, but will not use it in this case. A local variable is defined and then a loop is performed many times, incrementing the local variable.
The task() function below implements this.
1 2 3 4 5 |
# task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): variable += 1 |
Next, we can define a function that will create the mutex lock, then create and configure the thread to execute our task() function. The thread will be started and the function will block until the task is complete.
The entry() function below implements this.
1 2 3 4 5 6 7 8 9 |
# create and start thread def entry(): # create the shared lock lock = Lock() # create the new thread thread = Thread(target=task, args=(lock,)) thread.start() # wait for the new thread to terminate thread.join() |
So far, we have defined a function to execute in a new thread named task() and a function that will create and start the thread to execute our task function named entry().
We can execute our entry() function a number of times, then report the average time it takes for the function to execute.
This can be achieved using the timeit module, designed to time how long functions take to execute.
We can use the timeit.timeit() function and specify the function to execute, e.g. entry() and the number of times to execute it, in this case 3 times.
The call to the entry() function can be specified as a string to be evaluated, in which case we must also specify where the entry() function is defined, e.g. in the main module.
For example:
1 2 3 |
... # time the function report = timeit('entry()', 'from __main__ import entry', number=3) |
The function will return a time for how long the entry() function took to execute three times in seconds. We can then divide the result by the number of repeats, e.g. 3, to get the average time in seconds.
1 2 |
... print(report / 3) |
Tying this together, the complete baseline example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of mutex lock overhead without the lock from timeit import timeit from threading import Thread from threading import Lock # task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): variable += 1 # create and start thread def entry(): # create the shared lock lock = Lock() # create the new thread thread = Thread(target=task, args=(lock,)) thread.start() # wait for the new thread to terminate thread.join() # time the function report = timeit('entry()', 'from __main__ import entry', number=3) print(report / 3) |
Running the example executes our entry() function three times.
Each execution will create a thread to execute our task() function. The task() function will iterate 100 million times adding to a local variable then the thread will terminate.
In this case, we can see that the entry() function took about 4.457 seconds to execute.
Your results will differ. It will run faster or slower depending on the speed of your computer.
What result did you get, let me know in the comments below.
1 |
4.457686631333334 |
Next, let’s look at how much slower the task is when we add a mutex lock.
Free Python Threading Course
Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.
Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores
Overhead of Lock With acquire() and release()
We can calculate the overhead of using a lock via calls to acquire() and release().
Add Lock To Task Via Acquire/Release
The example in the previous section can be updated to make use of the lock within the task() function.
We can pretend that updating the local variable is a critical section, and require that the lock is acquired prior to the critical section, then released after the critical section each loop iteration.
1 2 3 4 |
... lock.acquire() variable += 1 lock.release() |
The updated version of the task() function with this change is listed below.
1 2 3 4 5 6 7 |
# task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): lock.acquire() variable += 1 lock.release() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# SuperFastPython.com # example of mutex lock overhead using the lock with acquire and release from timeit import timeit from threading import Thread from threading import Lock # task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): lock.acquire() variable += 1 lock.release() # create and start thread def entry(): # create the shared lock lock = Lock() # create the new thread thread = Thread(target=task, args=(lock,)) thread.start() # wait for the new thread to terminate thread.join() # time the function report = timeit('entry()', 'from __main__ import entry', number=3) print(report / 3) |
Running the example executes our entry() function three times.
In this case, the task() function that is executed in a new thread each time will acquire the lock then release the lock each iteration, looped 100 million times.
In this case, we can see that the entry() function took about 14.200 seconds to execute when protecting the critical section with a mutex lock.
Your results will differ. It will run faster or slower depending on the speed of your computer.
What result did you get? Let me know in the comments below.
1 |
14.200713367333334 |
Let’s analyze this result.
Lock With Acquire/Release vs No Lock
As expected, the program is slower when using the lock, e.g. 14.200 seconds, than not using the lock, e.g. 4.457 seconds.
We can now calculate the lock overhead as the time taken with the lock minus the time taken without the lock.
For example:
- Lock Overhead = Task With Lock – Task Without Lock
- Lock Overhead = 14.200 – 4.457
- Lock Overhead = 9.743
That is, this task took 9.743 seconds longer when using the lock. The lock overhead was 9.743.
We can also calculate the difference as a rate by dividing the time taken with the lock by the time taken without the lock.
For example:
- Rate Difference = Task With Lock / Task Without Lock
- Rate Difference = 14.200 / 4.457
- Rate Difference = 3.19
That is, using the program when using the lock was about 3.19 times or 3.19x slower than not using the lock.
Next, let’s explore if using the lock via the context manager has a similar effect on the performance of the task.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Overhead of Lock with Context Manager
We can update the baseline task to use the mutex lock each iteration via the context manager.
Add Lock To Task Via Context Manager
This will provide a calculation for the lock overhead for the context manager usage of the lock compared to the manual acquire/release usage of the lock.
This can be achieved by updating the task() function to acquire the lock each iteration via the context manager.
For example:
1 2 3 4 |
... # acquire the lock with lock: variable += 1 |
The updated task() function with this change is listed below.
1 2 3 4 5 6 7 |
# task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): # acquire the lock with lock: variable += 1 |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# SuperFastPython.com # example of mutex lock overhead using the lock with context manager from timeit import timeit from threading import Thread from threading import Lock # task to execute in a new thread def task(lock): variable = 0 for i in range(100000000): # acquire the lock with lock: variable += 1 # create and start thread def entry(): # create the shared lock lock = Lock() # create the new thread thread = Thread(target=task, args=(lock,)) thread.start() # wait for the new thread to terminate thread.join() # time the function report = timeit('entry()', 'from __main__ import entry', number=3) print(report / 3) |
Running the example executes our entry() function three times.
In this case, the task() function that is executed in a new thread each time will acquire the lock then release the lock each iteration using the context manager, looped 100 million times.
In this case, we can see that the entry() function took about 19.593 seconds to execute when protecting the critical section with a mutex lock via a context manager.
Your results will differ. It will run faster or slower depending on the speed of your computer.
What result did you get, let me know in the comments below.
1 |
19.593936963666668 |
Let’s analyze this result.
Overhead of Lock With Context Manager vs No Lock
Not surprisingly, the program is slower to use the lock with a context manager, e.g. 19.593 seconds, than to perform the task without the lock, e.g. 4.457 seconds.
We can calculate the lock overhead for using the lock with the context manager compared to not using the lock.
For example:
- Lock Overhead = Task With Lock – Task Without Lock
- Lock Overhead = 19.593 – 4.457
- Lock Overhead = 15.136
That is, it is about 15.136 seconds slower to complete the task with a lock using the context manager than completing the task without the lock at all.
We can also calculate the rate difference of using the lock via the context manager compared to the task not using the lock.
- Rate Difference = Task With Lock / Task Without Lock
- Rate Difference = 19.593 / 4.457
- Rate Difference = 4.40
That is, the program when using the lock via the context manager was about 4.40 times or 4.40x slower than not using the lock at all.
Overhead Lock With Context Manager vs Lock With Acquire/Release
It is a surprising result because it is slower to use the lock with a context manager, e.g. 19.593 seconds, than to use the lock by manually calling the acquire/release functions, e.g. 14.200.
We can calculate the overhead of using the lock via context manager vs using the lock via acquire/release.
- Lock Overhead = Lock With Context Manager – Lock With Acquire/Release
- Lock Overhead = 19.593 – 14.201
- Lock Overhead = 5.393
That is, it was about 5.393 seconds slower in this example to use the context manager than to manually call acquire() and release().
Finally, we can compare the two approaches to using the lock by calculating the rate difference.
- Rate Difference = Lock With Context Manager / Lock With Acquire/Release
- Rate Difference = 19.593 / 14.201
- Rate Difference = 1.38
That is, the program using the lock via the context manager was about 1.38 times or 1.38x slower than not the same program using the lock via manual calls to acquire/release.
Further Reading
This section provides additional resources that you may find helpful.
Python Threading Books
- Python Threading Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- Threading Module API Cheat Sheet
I also recommend specific chapters in the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Threading: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know the computational overhead for using a lock in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by mostafa rezaee on Unsplash
Do you have any questions?