Thread Lock Overhead in Python

April 21, 2022 Python Threading

The lock overhead can be about 3.19 times to 4.40 times slower than not using a lock.

In this tutorial you will discover how to calculate lock overhead in Python.

Let's get started.

Problem of Lock Overhead

A mutual exclusion lock or mutex lock is a synchronization primitive intended to prevent a race condition.

A mutex lock can be used to ensure that only one thread at a time executes a critical section of code at a time, while all other threads trying to execute the same code must wait until the currently executing thread is finished with the critical section and releases the lock.

Python provides a mutual exclusion lock via the threading.Lock class.

An instance of the lock can be created and then acquired by threads before accessing a critical section, and released after the critical section.

For example:

...
# create a lock
lock = Lock()
# acquire the lock
lock.acquire()
# critical section
# ...
# release the lock
lock.release()

Only one thread can have the lock at any time. If a thread does not release an acquired lock, it cannot be acquired again.

The thread attempting to acquire the lock will block until the lock is acquired, such as if another thread currently holds the lock then releases it.

We can also use the lock via the context manager. This will allow the critical section to be a block within the usage of the lock and for the lock to be released automatically once the block has completed.

For example:

...
# create a lock
lock = Lock()
# acquire the lock
with lock:
    # ...

You can learn more about mutex locks in the tutorial:

A problem with using a mutex lock is that it adds a computational overhead.

That is, using a lock can make code slower than not using the lock.

This is referred to as lock overhead, or the computational overhead for protecting a critical section using a mutex lock.

What is the computational overhead for using a lock in Python?

How to Calculate Lock Overhead

The computational overhead for using a lock can be calculated by comparing the same code with and without a mutex lock.

This requires a baseline task with a critical section executed in a loop without a lock. The time to complete this task can be determined and used as a point of reference.

The baseline task can then be updated so that the critical section is protected by a mutex lock each iteration.

The protected version of the task can then be timed and the result compared to the baseline.

The difference between timings will provide an indication of the overhead for using a mutex lock.

That is, the lock overhead can be calculated as the time taken for the protected version of the task minus the time taken for the baseline version of the task.

Next, let's define a baseline task.

Baseline Without Lock

In this section, we will define a baseline program used as a point of reference for calculating the lock overhead.

The task will involve iterating a loop a large number of times (100 million) and incrementing a variable. We will then time the execution of this task three times and report the average time taken.

Firstly, we can define the task function to be executed in a new thread. The task will take the mutex lock as an argument, but will not use it in this case. A local variable is defined and then a loop is performed many times, incrementing the local variable.

The task() function below implements this.

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        variable += 1

Next, we can define a function that will create the mutex lock, then create and configure the thread to execute our task() function. The thread will be started and the function will block until the task is complete.

The entry() function below implements this.

# create and start thread
def entry():
    # create the shared lock
    lock = Lock()
    # create the new thread
    thread = Thread(target=task, args=(lock,))
    thread.start()
    # wait for the new thread to terminate
    thread.join()

So far, we have defined a function to execute in a new thread named task() and a function that will create and start the thread to execute our task function named entry().

We can execute our entry() function a number of times, then report the average time it takes for the function to execute.

This can be achieved using the timeit module, designed to time how long functions take to execute.

We can use the timeit.timeit() function and specify the function to execute, e.g. entry() and the number of times to execute it, in this case 3 times.

The call to the entry() function can be specified as a string to be evaluated, in which case we must also specify where the entry() function is defined, e.g. in the main module.

For example:

...
# time the function
report = timeit('entry()', 'from __main__ import entry', number=3)

The function will return a time for how long the entry() function took to execute three times in seconds. We can then divide the result by the number of repeats, e.g. 3, to get the average time in seconds.

...
print(report / 3)

Tying this together, the complete baseline example is listed below.

# SuperFastPython.com
# example of mutex lock overhead without the lock
from timeit import timeit
from threading import Thread
from threading import Lock

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        variable += 1

# create and start thread
def entry():
    # create the shared lock
    lock = Lock()
    # create the new thread
    thread = Thread(target=task, args=(lock,))
    thread.start()
    # wait for the new thread to terminate
    thread.join()

# time the function
report = timeit('entry()', 'from __main__ import entry', number=3)
print(report / 3)

Running the example executes our entry() function three times.

Each execution will create a thread to execute our task() function. The task() function will iterate 100 million times adding to a local variable then the thread will terminate.

In this case, we can see that the entry() function took about 4.457 seconds to execute.

Your results will differ. It will run faster or slower depending on the speed of your computer.
What result did you get, let me know in the comments below.

4.457686631333334

Next, let's look at how much slower the task is when we add a mutex lock.

Overhead of Lock With acquire() and release()

We can calculate the overhead of using a lock via calls to acquire() and release().

Add Lock To Task Via Acquire/Release

The example in the previous section can be updated to make use of the lock within the task() function.

We can pretend that updating the local variable is a critical section, and require that the lock is acquired prior to the critical section, then released after the critical section each loop iteration.

...
lock.acquire()
variable += 1
lock.release()

The updated version of the task() function with this change is listed below.

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        lock.acquire()
        variable += 1
        lock.release()

Tying this together, the complete example is listed below.

# SuperFastPython.com
# example of mutex lock overhead using the lock with acquire and release
from timeit import timeit
from threading import Thread
from threading import Lock

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        lock.acquire()
        variable += 1
        lock.release()

# create and start thread
def entry():
    # create the shared lock
    lock = Lock()
    # create the new thread
    thread = Thread(target=task, args=(lock,))
    thread.start()
    # wait for the new thread to terminate
    thread.join()

# time the function
report = timeit('entry()', 'from __main__ import entry', number=3)
print(report / 3)

Running the example executes our entry() function three times.

In this case, the task() function that is executed in a new thread each time will acquire the lock then release the lock each iteration, looped 100 million times.

In this case, we can see that the entry() function took about 14.200 seconds to execute when protecting the critical section with a mutex lock.

Your results will differ. It will run faster or slower depending on the speed of your computer.

What result did you get? Let me know in the comments below.

14.200713367333334

Let's analyze this result.

Lock With Acquire/Release vs No Lock

As expected, the program is slower when using the lock, e.g. 14.200 seconds, than not using the lock, e.g. 4.457 seconds.

We can now calculate the lock overhead as the time taken with the lock minus the time taken without the lock.

For example:

That is, this task took 9.743 seconds longer when using the lock. The lock overhead was 9.743.

We can also calculate the difference as a rate by dividing the time taken with the lock by the time taken without the lock.

For example:

That is, using the program when using the lock was about 3.19 times or 3.19x slower than not using the lock.

Next, let's explore if using the lock via the context manager has a similar effect on the performance of the task.

Overhead of Lock with Context Manager

We can update the baseline task to use the mutex lock each iteration via the context manager.

Add Lock To Task Via Context Manager

This will provide a calculation for the lock overhead for the context manager usage of the lock compared to the manual acquire/release usage of the lock.

This can be achieved by updating the task() function to acquire the lock each iteration via the context manager.

For example:

...
# acquire the lock
with lock:
    variable += 1

The updated task() function with this change is listed below.

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        # acquire the lock
        with lock:
            variable += 1

Tying this together, the complete example is listed below.

# SuperFastPython.com
# example of mutex lock overhead using the lock with context manager
from timeit import timeit
from threading import Thread
from threading import Lock

# task to execute in a new thread
def task(lock):
    variable = 0
    for i in range(100000000):
        # acquire the lock
        with lock:
            variable += 1

# create and start thread
def entry():
    # create the shared lock
    lock = Lock()
    # create the new thread
    thread = Thread(target=task, args=(lock,))
    thread.start()
    # wait for the new thread to terminate
    thread.join()

# time the function
report = timeit('entry()', 'from __main__ import entry', number=3)
print(report / 3)

Running the example executes our entry() function three times.

In this case, the task() function that is executed in a new thread each time will acquire the lock then release the lock each iteration using the context manager, looped 100 million times.

In this case, we can see that the entry() function took about 19.593 seconds to execute when protecting the critical section with a mutex lock via a context manager.

Your results will differ. It will run faster or slower depending on the speed of your computer.
What result did you get, let me know in the comments below.

19.593936963666668

Let's analyze this result.

Overhead of Lock With Context Manager vs No Lock

Not surprisingly, the program is slower to use the lock with a context manager, e.g. 19.593 seconds, than to perform the task without the lock, e.g. 4.457 seconds.

We can calculate the lock overhead for using the lock with the context manager compared to not using the lock.

For example:

That is, it is about 15.136 seconds slower to complete the task with a lock using the context manager than completing the task without the lock at all.

We can also calculate the rate difference of using the lock via the context manager compared to the task not using the lock.

That is, the program when using the lock via the context manager was about 4.40 times or 4.40x slower than not using the lock at all.

Overhead Lock With Context Manager vs Lock With Acquire/Release

It is a surprising result because it is slower to use the lock with a context manager, e.g. 19.593 seconds, than to use the lock by manually calling the acquire/release functions, e.g. 14.200.

We can calculate the overhead of using the lock via context manager vs using the lock via acquire/release.

That is, it was about 5.393 seconds slower in this example to use the context manager than to manually call acquire() and release().

Finally, we can compare the two approaches to using the lock by calculating the rate difference.

That is, the program using the lock via the context manager was about 1.38 times or 1.38x slower than not the same program using the lock via manual calls to acquire/release.

Takeaways

You now know the computational overhead for using a lock in Python.



If you enjoyed this tutorial, you will love my book: Python Threading Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.