Python Benchmark Unit Test

You can develop a benchmark unit test by defining a unit test that calculates the execution time of a target function and asserts that it is below a threshold.

This can be used to automate performance testing to confirm target functions meet performance requirements and that code changes do not introduce regressions in performance.

In this tutorial, you will discover how to develop benchmark unit tests in Python.

Let’s get started.

Table of Contents

Need a Benchmark Unit Test

It is common to have performance requirements for Python code.

For example, we may require the execution time for a code snippet or function to be below some predefined threshold.

One way we can test an execution time performance requirement is to use unit tests.

Recall that unit tests are a programmatic way to define code requirements by executing and asserting details about the target code. Test suites can then be executed automatically and repeatedly to check code for regressions (e.g. the introduction of problems that cause tests to fail).

In computer programming, unit testing is a software testing method by which individual units of source code […] are tested to determine whether they are fit for use. It is a standard step in development and implementation approaches such as Agile.
— Unit testing, Wikipedia.

A unit test can be used to benchmark a target function and confirm that its overall execution time meets the requirements for the function.

This is different from regular benchmarking where the execution time is reported and compared to variations of the same target function. Instead, we are interested in an automated test of performance.

How can we use unit tests to define and enforce the performance requirements of a target function?

Run loops using all CPUs, download your FREE book to learn how.

How to Develop a Benchmark Unit Test

We can develop a unit test that benchmarks a target function and checks that the time is below a required threshold.

This can be achieved using a common Python unit testing framework, such as the unittest module in the Python standard library.

A test can be defined that records the start time and end time around a function, and then the execution time is calculated.

The time.perf_counter() function can be used to record times as it is preferred for benchmarking given that it is non-adjustable, monotonic, and uses a high-precision clock.

For example:

...

# record start time

time_start = perf_counter()

# execute the target function

...

# calculate the execution time

time_duration = perf_counter() - time_start

You can learn more about the time.perf_counter() function for benchmarking in the tutorial:

Benchmark Python with time.perf_counter()

The assertLess() or assertLessEqual() methods can be used to check if the execution time is below a required threshold in seconds.

For example:

...

# check if execution time is below a threshold in seconds

self.assertLess(time_duration, 1.0)

This allows performance requirements to be tested along with program functionality in unit tests.

Now that we know how to develop benchmark unit tests, let’s look at some worked examples.

Start Now: Free Python Benchmarking Crash Course

Example of Unit Testing List Functions

Before we explore an example of developing a benchmark unit test, we can first develop a classical unit test.

In this case, we will define two functions that create lists of squared integers. We will then define a new unit test class that tests each of these functions.

Firstly, we can define two functions that will be the targets of the testing.

Each function creates a list of squared integers of a given size. The size of the list is provided as an argument and each function uses a different approach to calculate squared integers.

# create a list of squared integers using power operator

def power_operator(size):

return [i**2 for i in range(size)]

# create a list of squared integers using multiplication operator

def multiplication_operator(size):

return [i*i for i in range(size)]

We can define a test case by defining a new object that extends the unittest.TestCase class.

Each test in the test case is defined as a method on the class whose name is prefixed with “test”.

We will define two tests, one for each function, and each test will create a list of 50 squared integers and confirm the returned list has a length of 50.

# unit test list functions

class TestListFunctions(unittest.TestCase):

# unit test for power_operator()

def test_power_operator(self):

data = power_operator(50)

self.assertEqual(len(data), 50)

# unit test for multiplication_operator()

def test_multiplication_operator(self):

data = multiplication_operator(50)

self.assertEqual(len(data), 50)

Finally, we can execute the test case.

This can be achieved by calling the unittest.main() function.

# protect the entry point

if __name__ == '__main__':

# execute the unit tests

unittest.main()

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of unit testing a function

import unittest

# create a list of squared integers using power operator

def power_operator(size):

return [i**2 for i in range(size)]

# create a list of squared integers using multiplication operator

def multiplication_operator(size):

return [i*i for i in range(size)]

# unit test list functions

class TestListFunctions(unittest.TestCase):

# unit test for power_operator()

def test_power_operator(self):

data = power_operator(50)

self.assertEqual(len(data), 50)

# unit test for multiplication_operator()

def test_multiplication_operator(self):

data = multiplication_operator(50)

self.assertEqual(len(data), 50)

# protect the entry point

if __name__ == '__main__':

# execute the unit tests

unittest.main()

Running the example runs the test case and executes the two tests.

Each test creates a list of squared integers and asserts that the created list has the required size.

All tests pass.

----------------------------------------------------------------------

Ran 2 tests in 0.000s

Next, let’s explore how we might update the example to add benchmark unit tests to the test suite.

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

Example of Benchmark Unit Tests For List Functions

We can explore how to add benchmark unit tests for our target functions that create lists of squared integers.

In this case, we can update the above TestListFunctions to add two more tests, one to benchmark each of the target functions.

The first will benchmark the power_operator() target function and confirm that the overall execution time for creating a list of 10,000,000 squared integers is below 0.7 seconds (700 milliseconds).

# benchmark unit test for power_operator()

def test_benchmark_power_operator(self):

time_start = perf_counter()

data = power_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.7)

The second test will benchmark the multiplication_operator() target function and confirm that the overall execution of creating a list of 10,000,000 squared integers is below 0.6 seconds (600 milliseconds).

# benchmark unit test for multiplication_operator()

def test_benchmark_multiplication_operator(self):

time_start = perf_counter()

data = multiplication_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.6)

The updated TestListFunctions class with these changes is listed below.

# unit test list functions

class TestListFunctions(unittest.TestCase):

# unit test for power_operator()

def test_power_operator(self):

data = power_operator(50)

self.assertEqual(len(data), 50)

# benchmark unit test for power_operator()

def test_benchmark_power_operator(self):

time_start = perf_counter()

data = power_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.7)

# unit test for multiplication_operator()

def test_multiplication_operator(self):

data = multiplication_operator(50)

self.assertEqual(len(data), 50)

# benchmark unit test for multiplication_operator()

def test_benchmark_multiplication_operator(self):

time_start = perf_counter()

data = multiplication_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.6)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmark unit testing a function

import unittest

from time import perf_counter

# create a list of squared integers using power operator

def power_operator(size):

return [i**2 for i in range(size)]

# create a list of squared integers using multiplication operator

def multiplication_operator(size):

return [i*i for i in range(size)]

# unit test list functions

class TestListFunctions(unittest.TestCase):

# unit test for power_operator()

def test_power_operator(self):

data = power_operator(50)

self.assertEqual(len(data), 50)

# benchmark unit test for power_operator()

def test_benchmark_power_operator(self):

time_start = perf_counter()

data = power_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.7)

# unit test for multiplication_operator()

def test_multiplication_operator(self):

data = multiplication_operator(50)

self.assertEqual(len(data), 50)

# benchmark unit test for multiplication_operator()

def test_benchmark_multiplication_operator(self):

time_start = perf_counter()

data = multiplication_operator(10000000)

time_duration = perf_counter() - time_start

self.assertLess(time_duration, 0.6)

# protect the entry point

if __name__ == '__main__':

# execute the unit tests

unittest.main()

Running the example runs the test case and executes the four-unit tests.

Each test creates a list of squared integers and asserts that the created list has the required size.

The benchmark tests calculate the execution time of the target functions and confirm that the overall execution time is below the predefined threshold.

All tests pass.

....

----------------------------------------------------------------------

Ran 4 tests in 1.321s

Next, let’s explore how we might choose a threshold for a performance test.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

How to Choose Limit For Benchmark Unit Test

A performance requirement may be specified as part of the requirements for the project and can be tested directly.

This is not always the case.

Sometimes there may be no explicit performance requirement. Instead, we want to test that future changes to the code do not cause the program to run slower, e.g. that there is no regression in performance.

In this case, we can first estimate the worst-case performance of the target function and use a value close to that as the threshold in the performance benchmark unit test.

This requires executing and benchmarking the target function many times and estimating the longest execution time. We don’t use the best or average execution time as the test may fail most or some of the time. Instead, we require an execution time that the program is expected to outperform every run.

Firstly, we can define a function that executes the target function and returns the execution time.

def benchmark():

time_start = perf_counter()

data = multiplication_operator(10000000)

time_duration = perf_counter() - time_start

print(f'>{time_duration}')

return time_duration

We can then run this function many times in a list comprehension and gather a collection of scores.

In this case, we will collect 30 benchmark scores.

...

# run many times and find worst case

results = [benchmark() for _ in range(30)]

We can then report the best and worst-case scores.

We only need the worst case, but the best case is reported for our own general interest.

...

# report statistics

print('Done')

print(f'Min: {min(results)}')

print(f'Max: {min(results)}')

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of exposing worst case for benchmark unit test

from time import perf_counter

# create a list of squared integers using multiplication operator

def multiplication_operator(size):

return [i*i for i in range(size)]

def benchmark():

time_start = perf_counter()

data = multiplication_operator(10000000)

time_duration = perf_counter() - time_start

print(f'>{time_duration}')

return time_duration

# run many times and find worst case

results = [benchmark() for _ in range(30)]

# report statistics

print('Done')

print(f'Min: {min(results)}')

print(f'Max: {min(results)}')

Running the example benchmarks the target function 30 times and reports every execution time, followed by the longest and shortest time.

The worst or longest execution time can be used as the basis for a threshold in a benchmark unit test to check for performance regressions.

It is a good idea to multiply the worst score by a factor, e.g. 1.2 as the execution of the target function as part of the unit test suite is expected to be slower in general than a normal program execution.

Some experimentation may be required to develop a reliable test that only fails on a regression and not due to normal statistical variation.

>0.49762165697757155

>0.505098193010781

>0.5046638440107927

>0.5109958149841987

>0.5018256849725731

>0.5009870909852907

>0.5181398210115731

>0.5071900020120665

>0.5073751489981078

>0.5037967539974488

>0.5108540950459428

>0.5108086710097268

>0.5035103390109725

>0.5028844319749624

>0.4996383949765004

>0.501869046012871

>0.4991130399866961

>0.49890127102844417

>0.49749664100818336

>0.4977406580001116

>0.4972482600132935

>0.4948188289999962

>0.4963120500324294

>0.49449029902461916

>0.493782164005097

>0.4931158840190619

>0.49310844700085

>0.4929708020063117

>0.494492124998942

>0.49517418700270355

Done

Min: 0.4929708020063117

Max: 0.4929708020063117

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Takeaways

You now know how to develop a benchmark unit test in Python.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any additional tips?
I’d love to hear about them!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Aleks Marinkovic on Unsplash

Need a Benchmark Unit Test

How to Develop a Benchmark Unit Test

Example of Unit Testing List Functions

Example of Benchmark Unit Tests For List Functions

How to Choose Limit For Benchmark Unit Test

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Python Benchmarking Fast
(without the frustration)

Additional menu

Need a Benchmark Unit Test

How to Develop a Benchmark Unit Test

Example of Unit Testing List Functions

Example of Benchmark Unit Tests For List Functions

How to Choose Limit For Benchmark Unit Test

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Python Benchmarking Fast (without the frustration)

Learn Python Benchmarking Fast
(without the frustration)