Python Benchmark Unit Test
You can develop a benchmark unit test by defining a unit test that calculates the execution time of a target function and asserts that it is below a threshold.
This can be used to automate performance testing to confirm target functions meet performance requirements and that code changes do not introduce regressions in performance.
In this tutorial, you will discover how to develop benchmark unit tests in Python.
Let's get started.
Need a Benchmark Unit Test
It is common to have performance requirements for Python code.
For example, we may require the execution time for a code snippet or function to be below some predefined threshold.
One way we can test an execution time performance requirement is to use unit tests.
Recall that unit tests are a programmatic way to define code requirements by executing and asserting details about the target code. Test suites can then be executed automatically and repeatedly to check code for regressions (e.g. the introduction of problems that cause tests to fail).
In computer programming, unit testing is a software testing method by which individual units of source code [...] are tested to determine whether they are fit for use. It is a standard step in development and implementation approaches such as Agile.
-- Unit testing, Wikipedia.
A unit test can be used to benchmark a target function and confirm that its overall execution time meets the requirements for the function.
This is different from regular benchmarking where the execution time is reported and compared to variations of the same target function. Instead, we are interested in an automated test of performance.
How can we use unit tests to define and enforce the performance requirements of a target function?
How to Develop a Benchmark Unit Test
We can develop a unit test that benchmarks a target function and checks that the time is below a required threshold.
This can be achieved using a common Python unit testing framework, such as the unittest module in the Python standard library.
A test can be defined that records the start time and end time around a function, and then the execution time is calculated.
The time.perf_counter() function can be used to record times as it is preferred for benchmarking given that it is non-adjustable, monotonic, and uses a high-precision clock.
For example:
...
# record start time
time_start = perf_counter()
# execute the target function
...
# calculate the execution time
time_duration = perf_counter() - time_start
You can learn more about the time.perf_counter() function for benchmarking in the tutorial:
The assertLess() or assertLessEqual() methods can be used to check if the execution time is below a required threshold in seconds.
For example:
...
# check if execution time is below a threshold in seconds
self.assertLess(time_duration, 1.0)
This allows performance requirements to be tested along with program functionality in unit tests.
Now that we know how to develop benchmark unit tests, let's look at some worked examples.
Example of Unit Testing List Functions
Before we explore an example of developing a benchmark unit test, we can first develop a classical unit test.
In this case, we will define two functions that create lists of squared integers. We will then define a new unit test class that tests each of these functions.
Firstly, we can define two functions that will be the targets of the testing.
Each function creates a list of squared integers of a given size. The size of the list is provided as an argument and each function uses a different approach to calculate squared integers.
# create a list of squared integers using power operator
def power_operator(size):
return [i**2 for i in range(size)]
# create a list of squared integers using multiplication operator
def multiplication_operator(size):
return [i*i for i in range(size)]
We can define a test case by defining a new object that extends the unittest.TestCase class.
Each test in the test case is defined as a method on the class whose name is prefixed with "test".
We will define two tests, one for each function, and each test will create a list of 50 squared integers and confirm the returned list has a length of 50.
# unit test list functions
class TestListFunctions(unittest.TestCase):
# unit test for power_operator()
def test_power_operator(self):
data = power_operator(50)
self.assertEqual(len(data), 50)
# unit test for multiplication_operator()
def test_multiplication_operator(self):
data = multiplication_operator(50)
self.assertEqual(len(data), 50)
Finally, we can execute the test case.
This can be achieved by calling the unittest.main() function.
# protect the entry point
if __name__ == '__main__':
# execute the unit tests
unittest.main()
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of unit testing a function
import unittest
# create a list of squared integers using power operator
def power_operator(size):
return [i**2 for i in range(size)]
# create a list of squared integers using multiplication operator
def multiplication_operator(size):
return [i*i for i in range(size)]
# unit test list functions
class TestListFunctions(unittest.TestCase):
# unit test for power_operator()
def test_power_operator(self):
data = power_operator(50)
self.assertEqual(len(data), 50)
# unit test for multiplication_operator()
def test_multiplication_operator(self):
data = multiplication_operator(50)
self.assertEqual(len(data), 50)
# protect the entry point
if __name__ == '__main__':
# execute the unit tests
unittest.main()
Running the example runs the test case and executes the two tests.
Each test creates a list of squared integers and asserts that the created list has the required size.
All tests pass.
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK
Next, let's explore how we might update the example to add benchmark unit tests to the test suite.
Example of Benchmark Unit Tests For List Functions
We can explore how to add benchmark unit tests for our target functions that create lists of squared integers.
In this case, we can update the above TestListFunctions to add two more tests, one to benchmark each of the target functions.
The first will benchmark the power_operator() target function and confirm that the overall execution time for creating a list of 10,000,000 squared integers is below 0.7 seconds (700 milliseconds).
# benchmark unit test for power_operator()
def test_benchmark_power_operator(self):
time_start = perf_counter()
data = power_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.7)
The second test will benchmark the multiplication_operator() target function and confirm that the overall execution of creating a list of 10,000,000 squared integers is below 0.6 seconds (600 milliseconds).
# benchmark unit test for multiplication_operator()
def test_benchmark_multiplication_operator(self):
time_start = perf_counter()
data = multiplication_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.6)
The updated TestListFunctions class with these changes is listed below.
# unit test list functions
class TestListFunctions(unittest.TestCase):
# unit test for power_operator()
def test_power_operator(self):
data = power_operator(50)
self.assertEqual(len(data), 50)
# benchmark unit test for power_operator()
def test_benchmark_power_operator(self):
time_start = perf_counter()
data = power_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.7)
# unit test for multiplication_operator()
def test_multiplication_operator(self):
data = multiplication_operator(50)
self.assertEqual(len(data), 50)
# benchmark unit test for multiplication_operator()
def test_benchmark_multiplication_operator(self):
time_start = perf_counter()
data = multiplication_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.6)
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of benchmark unit testing a function
import unittest
from time import perf_counter
# create a list of squared integers using power operator
def power_operator(size):
return [i**2 for i in range(size)]
# create a list of squared integers using multiplication operator
def multiplication_operator(size):
return [i*i for i in range(size)]
# unit test list functions
class TestListFunctions(unittest.TestCase):
# unit test for power_operator()
def test_power_operator(self):
data = power_operator(50)
self.assertEqual(len(data), 50)
# benchmark unit test for power_operator()
def test_benchmark_power_operator(self):
time_start = perf_counter()
data = power_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.7)
# unit test for multiplication_operator()
def test_multiplication_operator(self):
data = multiplication_operator(50)
self.assertEqual(len(data), 50)
# benchmark unit test for multiplication_operator()
def test_benchmark_multiplication_operator(self):
time_start = perf_counter()
data = multiplication_operator(10000000)
time_duration = perf_counter() - time_start
self.assertLess(time_duration, 0.6)
# protect the entry point
if __name__ == '__main__':
# execute the unit tests
unittest.main()
Running the example runs the test case and executes the four-unit tests.
Each test creates a list of squared integers and asserts that the created list has the required size.
The benchmark tests calculate the execution time of the target functions and confirm that the overall execution time is below the predefined threshold.
All tests pass.
....
----------------------------------------------------------------------
Ran 4 tests in 1.321s
OK
Next, let's explore how we might choose a threshold for a performance test.
How to Choose Limit For Benchmark Unit Test
A performance requirement may be specified as part of the requirements for the project and can be tested directly.
This is not always the case.
Sometimes there may be no explicit performance requirement. Instead, we want to test that future changes to the code do not cause the program to run slower, e.g. that there is no regression in performance.
In this case, we can first estimate the worst-case performance of the target function and use a value close to that as the threshold in the performance benchmark unit test.
This requires executing and benchmarking the target function many times and estimating the longest execution time. We don't use the best or average execution time as the test may fail most or some of the time. Instead, we require an execution time that the program is expected to outperform every run.
Firstly, we can define a function that executes the target function and returns the execution time.
def benchmark():
time_start = perf_counter()
data = multiplication_operator(10000000)
time_duration = perf_counter() - time_start
print(f'>{time_duration}')
return time_duration
We can then run this function many times in a list comprehension and gather a collection of scores.
In this case, we will collect 30 benchmark scores.
...
# run many times and find worst case
results = [benchmark() for _ in range(30)]
We can then report the best and worst-case scores.
We only need the worst case, but the best case is reported for our own general interest.
...
# report statistics
print('Done')
print(f'Min: {min(results)}')
print(f'Max: {min(results)}')
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of exposing worst case for benchmark unit test
from time import perf_counter
# create a list of squared integers using multiplication operator
def multiplication_operator(size):
return [i*i for i in range(size)]
def benchmark():
time_start = perf_counter()
data = multiplication_operator(10000000)
time_duration = perf_counter() - time_start
print(f'>{time_duration}')
return time_duration
# run many times and find worst case
results = [benchmark() for _ in range(30)]
# report statistics
print('Done')
print(f'Min: {min(results)}')
print(f'Max: {min(results)}')
Running the example benchmarks the target function 30 times and reports every execution time, followed by the longest and shortest time.
The worst or longest execution time can be used as the basis for a threshold in a benchmark unit test to check for performance regressions.
It is a good idea to multiply the worst score by a factor, e.g. 1.2 as the execution of the target function as part of the unit test suite is expected to be slower in general than a normal program execution.
Some experimentation may be required to develop a reliable test that only fails on a regression and not due to normal statistical variation.
>0.49762165697757155
>0.505098193010781
>0.5046638440107927
>0.5109958149841987
>0.5018256849725731
>0.5009870909852907
>0.5181398210115731
>0.5071900020120665
>0.5073751489981078
>0.5037967539974488
>0.5108540950459428
>0.5108086710097268
>0.5035103390109725
>0.5028844319749624
>0.4996383949765004
>0.501869046012871
>0.4991130399866961
>0.49890127102844417
>0.49749664100818336
>0.4977406580001116
>0.4972482600132935
>0.4948188289999962
>0.4963120500324294
>0.49449029902461916
>0.493782164005097
>0.4931158840190619
>0.49310844700085
>0.4929708020063117
>0.494492124998942
>0.49517418700270355
Done
Min: 0.4929708020063117
Max: 0.4929708020063117
Takeaways
You now know how to develop a benchmark unit test in Python.
If you enjoyed this tutorial, you will love my book: Python Benchmarking. It covers everything you need to master the topic with hands-on examples and clear explanations.