Last Updated on October 17, 2023
You can calculate Python benchmark metrics like the difference and the speedup in order to compare the performance improvements of concurrent versions of a program to the sequential version.
In this tutorial, you will discover how to calculate metrics to compare the benchmark performance of sequential versus concurrent programs in Python.
Let’s get started.
Compare Sequential vs Concurrent Benchmarks
It is good practice to develop a sequential version of a program before adding concurrency.
This is to ensure the testability and correctness of the code before adding complexity.
Often, our goal in adding concurrency to a program is to improve the performance of the program. Specifically, by reducing the runtime of the program.
This requires that first the sequential version of the program is benchmarked. The time taken to run the sequential version of the program provides a baseline of comparison that all concurrent versions of the program must improve upon, e.g. run in a shorter time.
Note, this only applies in those cases when we are adding concurrency for performance reasons. This is not always the case, such as when our program is required to perform multiple tasks concurrently.
The next step is to benchmark concurrent versions of the program.
Finally, the performance of the sequential and concurrent versions can be compared. But how?
There are many ways we could compare the two benchmark results.
What are the best ways to compare the performance of sequential and concurrent programs?
Run loops using all CPUs, download your FREE book to learn how.
How to Calculate Metrics for Comparing Sequential vs Concurrent Benchmarks
There are two metrics that we can use to directly compare the benchmark results of sequential vs concurrent programs, they are:
- Difference
- Speedup (factor)
Note, calculating speed-up percentages is a little tricky (to communicate correctly), so I have left it out for now. You can learn more here:
Let’s take a closer look at each in turn.
How to Calculate Benchmark Difference
The “difference” metric refers to the benchmark of the concurrent program subtracted from the benchmark of the sequential version of the program.
The result indicates how much faster (or slower) the concurrent version of the program is compared to the sequential version.
It is calculated as follows:
- difference = sequential_duration – concurrent_duration
The result should be above zero, indicating an improvement in the performance of the concurrent version over the sequential version of the program.
Once calculated, we can then report the result, for example:
- The concurrent version is [difference] seconds faster than the sequential version.
What if the Difference is Zero?
If the difference is zero, or close to zero, then we can say that the benchmark performance of the programs is equal or nearly equal.
That there is no execution time performance benefit from the concurrent version.
What if the Difference is Negative?
The difference result can be negative.
This means that the concurrent version takes longer to execute than the sequential version.
In this case, the concurrent version has a worse performance benchmark than the sequential version performance benchmark.
When reporting the result, we can remove the negative sign from the result and change “faster” to “slower” in the summary.
For example:
- The concurrent version is [difference] seconds slower than the sequential version.
How to Calculate Benchmark Speedup
We can use the benchmark scores for the sequential and concurrent programs to calculate the speedup metric.
That is, what is the speedup of the concurrent version of the program compared to the sequential version of the program?
In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources.
— Speedup, Wikipedia.
The speedup, which I call “speedup factor” , is calculated as the performance benchmark of the sequential version of the program divided by the performance benchmark for the concurrent version of the program.
It is calculated as follows:
- speedup_factor = sequential_duration / concurrent_duration
The result should be above one, indicating an improvement in the performance of the concurrent version over the sequential version of the program.
The result is not a time in seconds. It is a factor. It is sometimes referred to as the “latency”, as in “the speedup in latency” when measuring the time taken for a task or program.
Once calculated, we can report the speedup for the concurrent version as follows:
- The concurrent version has a [speedup_factor]x speedup over the sequential version.
Or perhaps:
- The concurrent version is [speedup_factor] times faster than the sequential version.
What if the Speedup is 1?
If the speedup is one or close to one, then it suggests that the performance of the sequential and concurrent versions of the program is equivalent or nearly equivalent.
It suggests that there is no execution time performance benefit in adopting the concurrent version of the program over the sequential version.
What if the Speedup is Less Than 1?
The speedup could be less than one, e.g. 0.5.
This means that the concurrent version does not offer a speedup over the sequential version of the program.
In that case, the terms in the calculation could be switched.
For example:
- speedup_factor = concurrent_duration / sequential_duration
The way the result is reported can then be changed to highlight that the concurrent version is slower than the sequential version.
For example:
- The concurrent version has a [speedup_factor]x slowdown over the sequential version.
Or perhaps:
- The concurrent version is [speedup_factor] times slower than the sequential version.
Now that we know how to calculate performance benchmarks, let’s look at some worked examples.
Example of Benchmarking Sequential vs Concurrent Function
Before calculating some performance benchmarks for comparison, we need to benchmark two versions of a program.
In this example we will develop a sequential version of a program that executes multiple tasks and benchmark it. We will then update the program to use concurrency and benchmark the new version.
These benchmarks will then be used as a basis for calculating metrics for comparison in subsequent sections.
Benchmarking Sequential Tasks
We can develop a program to sequentially execute tasks.
Firstly, we will define a task that takes an argument and then blocks for one second to simulate effort.
1 2 3 4 |
# task function def task(data): # block for a moment to simulate work sleep(1) |
We will hen execute 20 of these tasks sequentially, one-by-one.
1 2 3 4 5 |
# do all the work def main(): # execute many tasks concurrently for i in range(20): task(i) |
Because there are 20 tasks and each task takes 1 second, then we expect the program to complete in about 20 seconds.
Finally, we will benchmark the execution time of the program manually using the time.time() function.
If you are new to benchmarking the execution time of Python programs, see the tutorial:
1 2 3 4 5 6 7 8 9 10 |
# protect the entry point if __name__ == '__main__': # record start time time_start = time() # call benchmark code main() # calculate the duration time_duration = time() - time_start # report the duration print(f'Took {time_duration:.3f} seconds') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# SuperFastPython.com # example of a program that executes tasks sequentially from time import time from time import sleep # task function def task(data): # block for a moment to simulate work sleep(1) # do all the work def main(): # execute many tasks concurrently for i in range(20): task(i) # protect the entry point if __name__ == '__main__': # record start time time_start = time() # call benchmark code main() # calculate the duration time_duration = time() - time_start # report the duration print(f'Took {time_duration:.3f} seconds') |
Running the program executes all 20 tasks sequentially in a loop.
Finally the benchmark execution time is reported.
As expected, the program takes about 20 seconds to complete.
1 |
Took 20.053 seconds |
Next, let’s look at how we can update the program to use concurrency and benchmark its performance.
Benchmarking Concurrent Tasks
We can update the example to execute the tasks concurrently.
This can be achieved using the ThreadPoolExecutor with one worker per task, and issuing all tasks at once for concurrent execution.
For example:
1 2 3 4 5 6 7 |
... # create the thread pool n_tasks = 20 with ThreadPoolExecutor(n_tasks) as tpe: # issue all tasks _ = [tpe.submit(task, i) for i in range(n_tasks)] # wait for all tasks to complete |
If you are new to the ThreadPoolExecutor, I recommend the guide:
The expectation is that because all 20 tasks are to be executed concurrently, the overall execution time of the program should drop to about one second, the duration of each task.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# SuperFastPython.com # example of a program that executes tasks concurrently from time import time from time import sleep from concurrent.futures import ThreadPoolExecutor # task function def task(data): # block for a moment to simulate work sleep(1) # do all the work def main(): # create the thread pool n_tasks = 20 with ThreadPoolExecutor(n_tasks) as tpe: # issue all tasks _ = [tpe.submit(task, i) for i in range(n_tasks)] # wait for all tasks to complete # protect the entry point if __name__ == '__main__': # record start time time_start = time() # call benchmark code main() # calculate the duration time_duration = time() - time_start # report the duration print(f'Took {time_duration:.3f} seconds') |
Running the program executes all 20 tasks concurrently in the thread pool.
Finally the benchmark execution time is reported.
As expected, the program takes about 1 second to complete.
1 |
Took 1.006 seconds |
Next, let’s review the benchmark results.
Benchmark Results
Before calculating metrics, it is important to review the benchmark results.
The table below summarizes each program and lists the benchmark execution time in seconds.
1 2 3 4 |
Method | Duration (sec) ------------------------------ Sequential | 20.053 Concurrent | 1.006 |
Looking at the raw benchmark results, we can see that obviously, one program is much faster than the other.
But how much faster and how can we report this result?
Next, let’s look at how we can calculate performance benchmark metrics for reporting.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Example of Calculating Difference
We can explore how to calculate the difference metric using the real execution time performance benchmark results from the previous section.
Recall, the calculation for the difference metric is as follows:
- difference = sequential_duration – concurrent_duration
We can then plug in the execution time results from our sequential and concurrent benchmarks and calculate the difference.
- difference = 20.053 – 1.006
- 19.047 = 20.053 – 1.006
We can then report the result using our template, for example:
- The concurrent version is 19.047 seconds faster than the sequential version.
It is also a good idea to put the result in a table, as it is likely you will benchmark many concurrent variations of the original program (e.g. processes, threads, asyncio, varied number of workers, etc.).
1 2 3 4 |
Method | Difference (sec) -------------------------------- Sequential | n/a Concurrent | 19.047 |
Next, let’s look at how we might calculate the speedup metric.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Calculating Speedup
We can explore how to calculate the speedup metric using the real execution time performance benchmark results from the previous section.
Recall, the calculation for the speedup metric is as follows:
- speedup_factor = sequential_duration / concurrent_duration
We can then plug in the execution time results from our sequential and concurrent benchmarks and calculate the difference.
- speedup_factor = 20.053 / 1.006
- 19.933 = 20.053 / 1.006
We can then report the result using our template, for example:
- The concurrent version has a 19.933x speedup over the sequential version.
Or perhaps:
- The concurrent version is 19.933 times faster than the sequential version.
It is also a good idea to put the result in a table, as it is likely you will benchmark many concurrent variations of the original program.
1 2 3 4 |
Method | Speedup (multiple) --------------------------------- Sequential | n/a Concurrent | 19.933 |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to calculate metrics to compare the benchmark performance of sequential versus concurrent programs in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Vander Films on Unsplash
Do you have any questions?