You can develop a benchmark unit test by defining a unit test that calculates the execution time of a target function and asserts that it is below a threshold.
This can be used to automate performance testing to confirm target functions meet performance requirements and that code changes do not introduce regressions in performance.
In this tutorial, you will discover how to develop benchmark unit tests in Python.
Let’s get started.
Need a Benchmark Unit Test
It is common to have performance requirements for Python code.
For example, we may require the execution time for a code snippet or function to be below some predefined threshold.
One way we can test an execution time performance requirement is to use unit tests.
Recall that unit tests are a programmatic way to define code requirements by executing and asserting details about the target code. Test suites can then be executed automatically and repeatedly to check code for regressions (e.g. the introduction of problems that cause tests to fail).
In computer programming, unit testing is a software testing method by which individual units of source code […] are tested to determine whether they are fit for use. It is a standard step in development and implementation approaches such as Agile.
— Unit testing, Wikipedia.
A unit test can be used to benchmark a target function and confirm that its overall execution time meets the requirements for the function.
This is different from regular benchmarking where the execution time is reported and compared to variations of the same target function. Instead, we are interested in an automated test of performance.
How can we use unit tests to define and enforce the performance requirements of a target function?
Run loops using all CPUs, download your FREE book to learn how.
How to Develop a Benchmark Unit Test
We can develop a unit test that benchmarks a target function and checks that the time is below a required threshold.
This can be achieved using a common Python unit testing framework, such as the unittest module in the Python standard library.
A test can be defined that records the start time and end time around a function, and then the execution time is calculated.
The time.perf_counter() function can be used to record times as it is preferred for benchmarking given that it is non-adjustable, monotonic, and uses a high-precision clock.
For example:
1 2 3 4 5 6 7 |
... # record start time time_start = perf_counter() # execute the target function ... # calculate the execution time time_duration = perf_counter() - time_start |
You can learn more about the time.perf_counter() function for benchmarking in the tutorial:
The assertLess() or assertLessEqual() methods can be used to check if the execution time is below a required threshold in seconds.
For example:
1 2 3 |
... # check if execution time is below a threshold in seconds self.assertLess(time_duration, 1.0) |
This allows performance requirements to be tested along with program functionality in unit tests.
Now that we know how to develop benchmark unit tests, let’s look at some worked examples.
Example of Unit Testing List Functions
Before we explore an example of developing a benchmark unit test, we can first develop a classical unit test.
In this case, we will define two functions that create lists of squared integers. We will then define a new unit test class that tests each of these functions.
Firstly, we can define two functions that will be the targets of the testing.
Each function creates a list of squared integers of a given size. The size of the list is provided as an argument and each function uses a different approach to calculate squared integers.
1 2 3 4 5 6 7 |
# create a list of squared integers using power operator def power_operator(size): return [i**2 for i in range(size)] # create a list of squared integers using multiplication operator def multiplication_operator(size): return [i*i for i in range(size)] |
We can define a test case by defining a new object that extends the unittest.TestCase class.
Each test in the test case is defined as a method on the class whose name is prefixed with “test”.
We will define two tests, one for each function, and each test will create a list of 50 squared integers and confirm the returned list has a length of 50.
1 2 3 4 5 6 7 8 9 10 11 12 |
# unit test list functions class TestListFunctions(unittest.TestCase): # unit test for power_operator() def test_power_operator(self): data = power_operator(50) self.assertEqual(len(data), 50) # unit test for multiplication_operator() def test_multiplication_operator(self): data = multiplication_operator(50) self.assertEqual(len(data), 50) |
Finally, we can execute the test case.
This can be achieved by calling the unittest.main() function.
1 2 3 4 |
# protect the entry point if __name__ == '__main__': # execute the unit tests unittest.main() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# SuperFastPython.com # example of unit testing a function import unittest # create a list of squared integers using power operator def power_operator(size): return [i**2 for i in range(size)] # create a list of squared integers using multiplication operator def multiplication_operator(size): return [i*i for i in range(size)] # unit test list functions class TestListFunctions(unittest.TestCase): # unit test for power_operator() def test_power_operator(self): data = power_operator(50) self.assertEqual(len(data), 50) # unit test for multiplication_operator() def test_multiplication_operator(self): data = multiplication_operator(50) self.assertEqual(len(data), 50) # protect the entry point if __name__ == '__main__': # execute the unit tests unittest.main() |
Running the example runs the test case and executes the two tests.
Each test creates a list of squared integers and asserts that the created list has the required size.
1 2 3 4 5 6 7 8 |
All tests pass. .. ---------------------------------------------------------------------- Ran 2 tests in 0.000s OK |
Next, let’s explore how we might update the example to add benchmark unit tests to the test suite.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Example of Benchmark Unit Tests For List Functions
We can explore how to add benchmark unit tests for our target functions that create lists of squared integers.
In this case, we can update the above TestListFunctions to add two more tests, one to benchmark each of the target functions.
The first will benchmark the power_operator() target function and confirm that the overall execution time for creating a list of 10,000,000 squared integers is below 0.7 seconds (700 milliseconds).
1 2 3 4 5 6 |
# benchmark unit test for power_operator() def test_benchmark_power_operator(self): time_start = perf_counter() data = power_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.7) |
The second test will benchmark the multiplication_operator() target function and confirm that the overall execution of creating a list of 10,000,000 squared integers is below 0.6 seconds (600 milliseconds).
1 2 3 4 5 6 |
# benchmark unit test for multiplication_operator() def test_benchmark_multiplication_operator(self): time_start = perf_counter() data = multiplication_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.6) |
The updated TestListFunctions class with these changes is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# unit test list functions class TestListFunctions(unittest.TestCase): # unit test for power_operator() def test_power_operator(self): data = power_operator(50) self.assertEqual(len(data), 50) # benchmark unit test for power_operator() def test_benchmark_power_operator(self): time_start = perf_counter() data = power_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.7) # unit test for multiplication_operator() def test_multiplication_operator(self): data = multiplication_operator(50) self.assertEqual(len(data), 50) # benchmark unit test for multiplication_operator() def test_benchmark_multiplication_operator(self): time_start = perf_counter() data = multiplication_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.6) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# SuperFastPython.com # example of benchmark unit testing a function import unittest from time import perf_counter # create a list of squared integers using power operator def power_operator(size): return [i**2 for i in range(size)] # create a list of squared integers using multiplication operator def multiplication_operator(size): return [i*i for i in range(size)] # unit test list functions class TestListFunctions(unittest.TestCase): # unit test for power_operator() def test_power_operator(self): data = power_operator(50) self.assertEqual(len(data), 50) # benchmark unit test for power_operator() def test_benchmark_power_operator(self): time_start = perf_counter() data = power_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.7) # unit test for multiplication_operator() def test_multiplication_operator(self): data = multiplication_operator(50) self.assertEqual(len(data), 50) # benchmark unit test for multiplication_operator() def test_benchmark_multiplication_operator(self): time_start = perf_counter() data = multiplication_operator(10000000) time_duration = perf_counter() - time_start self.assertLess(time_duration, 0.6) # protect the entry point if __name__ == '__main__': # execute the unit tests unittest.main() |
Running the example runs the test case and executes the four-unit tests.
Each test creates a list of squared integers and asserts that the created list has the required size.
The benchmark tests calculate the execution time of the target functions and confirm that the overall execution time is below the predefined threshold.
1 2 3 4 5 6 7 8 |
All tests pass. .... ---------------------------------------------------------------------- Ran 4 tests in 1.321s OK |
Next, let’s explore how we might choose a threshold for a performance test.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
How to Choose Limit For Benchmark Unit Test
A performance requirement may be specified as part of the requirements for the project and can be tested directly.
This is not always the case.
Sometimes there may be no explicit performance requirement. Instead, we want to test that future changes to the code do not cause the program to run slower, e.g. that there is no regression in performance.
In this case, we can first estimate the worst-case performance of the target function and use a value close to that as the threshold in the performance benchmark unit test.
This requires executing and benchmarking the target function many times and estimating the longest execution time. We don’t use the best or average execution time as the test may fail most or some of the time. Instead, we require an execution time that the program is expected to outperform every run.
Firstly, we can define a function that executes the target function and returns the execution time.
1 2 3 4 5 6 |
def benchmark(): time_start = perf_counter() data = multiplication_operator(10000000) time_duration = perf_counter() - time_start print(f'>{time_duration}') return time_duration |
We can then run this function many times in a list comprehension and gather a collection of scores.
In this case, we will collect 30 benchmark scores.
1 2 3 |
... # run many times and find worst case results = [benchmark() for _ in range(30)] |
We can then report the best and worst-case scores.
We only need the worst case, but the best case is reported for our own general interest.
1 2 3 4 5 |
... # report statistics print('Done') print(f'Min: {min(results)}') print(f'Max: {min(results)}') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# SuperFastPython.com # example of exposing worst case for benchmark unit test from time import perf_counter # create a list of squared integers using multiplication operator def multiplication_operator(size): return [i*i for i in range(size)] def benchmark(): time_start = perf_counter() data = multiplication_operator(10000000) time_duration = perf_counter() - time_start print(f'>{time_duration}') return time_duration # run many times and find worst case results = [benchmark() for _ in range(30)] # report statistics print('Done') print(f'Min: {min(results)}') print(f'Max: {min(results)}') |
Running the example benchmarks the target function 30 times and reports every execution time, followed by the longest and shortest time.
The worst or longest execution time can be used as the basis for a threshold in a benchmark unit test to check for performance regressions.
It is a good idea to multiply the worst score by a factor, e.g. 1.2 as the execution of the target function as part of the unit test suite is expected to be slower in general than a normal program execution.
Some experimentation may be required to develop a reliable test that only fails on a regression and not due to normal statistical variation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
>0.49762165697757155 >0.505098193010781 >0.5046638440107927 >0.5109958149841987 >0.5018256849725731 >0.5009870909852907 >0.5181398210115731 >0.5071900020120665 >0.5073751489981078 >0.5037967539974488 >0.5108540950459428 >0.5108086710097268 >0.5035103390109725 >0.5028844319749624 >0.4996383949765004 >0.501869046012871 >0.4991130399866961 >0.49890127102844417 >0.49749664100818336 >0.4977406580001116 >0.4972482600132935 >0.4948188289999962 >0.4963120500324294 >0.49449029902461916 >0.493782164005097 >0.4931158840190619 >0.49310844700085 >0.4929708020063117 >0.494492124998942 >0.49517418700270355 Done Min: 0.4929708020063117 Max: 0.4929708020063117 |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to develop a benchmark unit test in Python.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any additional tips?
I’d love to hear about them!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Aleks Marinkovic on Unsplash
Do you have any questions?