Python Benchmarking With pyperf

You can use the pyperf library to benchmark, analyze, and compare the performance of snippets of Python code.

It extends the timeit module’s capabilities and includes the ability to execute benchmarks concurrently and to gather and report summary statistics.

In this tutorial, you will discover how to benchmark Python code using the pyperf open source library.

Let’s get started.

Table of Contents

What is pyperf

The pyperf project is a Python library for benchmarking.

It was developed by the Python Software Foundation and is available as open source on GitHub.

The Python pyperf module is a toolkit to write, run and analyze benchmarks.
— pyperf, GitHub Project.

The pyperf wraps the timeit module and offers more features for reliable benchmarking snippets of Python code.

One key addition is the use of worker threads for executing benchmarks concurrently.

pyperf starts by spawning a first worker process (Run 1) only to calibrate the benchmark: compute the number of outer loops: 2^15 loops on the example. Then pyperf spawns 20 worker processes (Run 2 .. Run 21). Each worker starts by running the benchmark once to “warmup” the process, but this result is ignored in the final result. Then each worker runs the benchmark 3 times.
— Run a benchmark, Perf User Guide

It also offers other additional features, such as making available summary statistics on benchmark results, such as the distribution, mean and standard deviation by default.

It also reports warnings if the benchmark duration is too short and in turn likely unstable.

The pyperf project documentation includes a user guide and developer guide:

Now that we know about the pyperf project, let’s look at how we can use it.

Run loops using all CPUs, download your FREE book to learn how.

How to Use pyperf

Using the pyperf library for benchmarking involves three steps, they are:

Install pyperf
Run benchmarks
Analyze results

Let’s take a closer look at each in turn.

1. Install pyperf

The first step is to install the pyperf library.

This can be achieved using your favorite Python package manager, such as pip.

For example:

1	pip install pyperf

2. Run benchmarks

The pyperf library provides both a Python API and a command line interface.

Like the timeit module on which it relies, it seems that the pyperf library’s preferred usage is via the command line interface.

We can benchmark snippets of Python code on the command line by executing the pyperf module via the “python -m” flag, then specify the timeit command followed by the code snippet to execute.

For example:

1	python -m pyperf timeit "..."

The “pyperf timeit” command takes many of the same arguments as the Python timeit module command.

You can learn more here:

pyperf timeit

The pyperf timeit command offers a number of benefits over the timeit module in the standard library.

* It displays the average and the standard deviation
* It runs the benchmark in multiple processes
* By default, it skips the first value in each process to warmup the benchmark
* It does not disable the garbage collector
— pyperf commands

We can save benchmark results to a JSON file via the -o flag followed by a filename.

For example:

1	python -m pyperf timeit "..." -o filename.json

This is helpful for analysis and comparison in the next section.

3. Analyze and Compare Results

The pyperf library includes a number of tools for analyzing and comparing benchmark results.

One example is the “pyperf stats” command.

It takes the name of a benchmark JSON file and then reports statistical details of the benchmark results.

* Mean and standard deviation.
* Median and median absolute deviation (MAD).
* Percentiles.
* Outliers: number of values out of the range
— pyperf commands

For example:

1	python -m pyperf stats filename.json

We can also compare benchmark results using the “pyperf compare_to” command.

This command takes arguments like the way to present the results followed by the filenames that contain the raw benchmark results.

Importantly, it uses statistical significance to report whether the difference between results is more likely meaningful or a statistical fluke.

pyperf determines whether two samples differ significantly using a Student’s two-sample, two-tailed t-test with alpha equals to 0.95.
— pyperf commands

For example:

1	python -m pyperf compare_to --table file1.json file2.json file3.json

Now that we know how to use the pyperf library for benchmarking, let’s look at a worked example.

Start Now: Free Python Benchmarking Crash Course

Example of Benchmarking with pyperf

We can use the pyperf command line interface to benchmark and compare different snippets for creating a list of squared integers.

The three methods we will compare are:

Use the math.pow() function.
Use the ** power operator.
Use the * multiplication operator.

We will calculate and review the benchmark results for each method, then finally compare all methods directly.

Benchmark List of Squared Integers With math.pow()

Firstly, we can benchmark the creation of a list of one million squared integers using the math.pow() function.

1 2	... [math.pow(i,2) for i in range(1000000)]

This requires importing the math module via the -s setup flag.

We will save the results into the “mathpow.json” function.

For example:

1	python -m pyperf timeit -s 'import math' '[math.pow(i,2) for i in range(1000000)]' -o mathpow.json

Running the example benchmarks this snippet using the default number of loops and repetitions.

In this case, the mean execution time of the snippet is 113 milliseconds with a standard deviation of about 2 milliseconds.

Your results may vary.

1 2	..................... Mean +- std dev: 113 ms +- 2 ms

We can review the distribution of results using the “pyperf stats” command.

For example:

1	python -m pyperf stats mathpow.json

We can see the details of the distribution of benchmark results, seeing that we gathered a distribution of 60 samples or 3 iterations to run with 20 runs.

Total duration: 10.0 sec

Start date: 2023-10-06 08:13:19

End date: 2023-10-06 08:13:31

Raw value minimum: 110 ms

Raw value maximum: 120 ms

Number of calibration run: 1

Number of run with values: 20

Total number of run: 21

Number of warmup per run: 1

Number of value per run: 3

Loop iterations per value: 1

Total number of values: 60

Minimum: 110 ms

Median +- MAD: 113 ms +- 1 ms

Mean +- std dev: 113 ms +- 2 ms

Maximum: 120 ms

0th percentile: 110 ms (-3% of the mean) -- minimum

5th percentile: 111 ms (-2% of the mean)

25th percentile: 112 ms (-1% of the mean) -- Q1

50th percentile: 113 ms (-0% of the mean) -- median

75th percentile: 114 ms (+1% of the mean) -- Q3

95th percentile: 118 ms (+4% of the mean)

100th percentile: 120 ms (+6% of the mean) -- maximum

Number of outlier (out of 109 ms..118 ms): 4

Benchmark List of Squared Integers With i**2

We can perform a similar benchmark, in this case using the ** power operator.

1 2	... [i**2 for i in range(1000000)]

Results will be saved to the powerop.json file.

The updated powerop.json command is listed below.

1	python -m pyperf timeit '[i**2 for i in range(1000000)]' -o powerop.json

Running the command, we can see that the average execution time for the snippet was about 74.2 milliseconds with a standard deviation of about 1.5 milliseconds.

1 2	..................... Mean +- std dev: 74.2 ms +- 1.5 ms

Next, we can review the statistics of the collected benchmark results using the “pyperf stats” command:

1	python -m pyperf stats powerop.json

We can see that again 60 samples were collected and we can see the distribution of results.

Total duration: 12.9 sec

Start date: 2023-10-06 08:13:44

End date: 2023-10-06 08:13:59

Raw value minimum: 143 ms

Raw value maximum: 157 ms

Number of calibration run: 1

Number of run with values: 20

Total number of run: 21

Number of warmup per run: 1

Number of value per run: 3

Loop iterations per value: 2

Total number of values: 60

Minimum: 71.4 ms

Median +- MAD: 74.5 ms +- 0.9 ms

Mean +- std dev: 74.2 ms +- 1.5 ms

Maximum: 78.3 ms

0th percentile: 71.4 ms (-4% of the mean) -- minimum

5th percentile: 72.0 ms (-3% of the mean)

25th percentile: 73.0 ms (-2% of the mean) -- Q1

50th percentile: 74.5 ms (+0% of the mean) -- median

75th percentile: 75.1 ms (+1% of the mean) -- Q3

95th percentile: 76.5 ms (+3% of the mean)

100th percentile: 78.3 ms (+5% of the mean) -- maximum

Number of outlier (out of 69.8 ms..78.3 ms): 0

Benchmark List of Squared Integers With i*i

Finally, we can perform the final benchmark, in this case with the code updated to use the multiplication operator.

1 2	... [i*i for i in range(1000000)]

The results are saved to the multiplyop.json file.

The updated “pyperf timeit” command is listed below:

1	python -m pyperf timeit '[i*i for i in range(1000000)]' -o multiplyop.json

Running the command, we can see that the average duration to execute the snippet was about 61.8 milliseconds with a standard deviation of about 1.4 milliseconds.

1 2	..................... Mean +- std dev: 61.8 ms +- 1.4 ms

Next, we can review the statistics of the collected benchmark results using the “pyperf stats” command:

1	python -m pyperf stats multiplyop.json

We can see that 60 samples were collected and we can see the distribution and central tendency of the collected measure results.

Total duration: 10.9 sec

Start date: 2023-10-06 08:14:00

End date: 2023-10-06 08:14:13

Raw value minimum: 119 ms

Raw value maximum: 131 ms

Number of calibration run: 1

Number of run with values: 20

Total number of run: 21

Number of warmup per run: 1

Number of value per run: 3

Loop iterations per value: 2

Total number of values: 60

Minimum: 59.4 ms

Median +- MAD: 61.7 ms +- 1.1 ms

Mean +- std dev: 61.8 ms +- 1.4 ms

Maximum: 65.6 ms

0th percentile: 59.4 ms (-4% of the mean) -- minimum

5th percentile: 59.8 ms (-3% of the mean)

25th percentile: 60.7 ms (-2% of the mean) -- Q1

50th percentile: 61.7 ms (-0% of the mean) -- median

75th percentile: 62.8 ms (+2% of the mean) -- Q3

95th percentile: 64.1 ms (+4% of the mean)

100th percentile: 65.6 ms (+6% of the mean) -- maximum

Number of outlier (out of 57.6 ms..65.9 ms): 0

Comparison of Results

We have now collected benchmark results from 3 approaches that achieve the same result using different techniques.

We can manually gather the results and report them in a table, for example:

manual

Method | Mean Time (ms) | Std Time (ms)

---------|----------------|--------------

math.pow | 113.0 | 2.0

i**2 | 74.2 | 1.5

i*i | 61.8 | 1.4

We can also compare the results automatically using the “pyperf compare_to” command.

We will configure the command to show the results in a table via the –table flag and then specify the three result files.

For example:

1	python -m pyperf compare_to --table mathpow.json powerop.json multiplyop.json

We can see that all 3 results are reported in a table, with one result per column.

We can see that the second two results are compared to the first, showing the speedup factor.

In this case, the multiplication operator was the fastest with a benchmark score that was about 1.83x faster than the math.pow() function.

+-----------+---------+-----------------------+-----------------------+

+===========+=========+=======================+=======================+

+-----------+---------+-----------------------+-----------------------+

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

Takeaways

You now know how to benchmark Python code using the pyperf open-source library.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any additional tips?
I’d love to hear about them!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Nicolas Hoizey on Unsplash

Python Benchmarking With pyperf

What is pyperf