You can use the pyperf library to benchmark, analyze, and compare the performance of snippets of Python code.
It extends the timeit module’s capabilities and includes the ability to execute benchmarks concurrently and to gather and report summary statistics.
In this tutorial, you will discover how to benchmark Python code using the pyperf open source library.
Let’s get started.
What is pyperf
The pyperf project is a Python library for benchmarking.
It was developed by the Python Software Foundation and is available as open source on GitHub.
The Python pyperf module is a toolkit to write, run and analyze benchmarks.
— pyperf, GitHub Project.
The pyperf wraps the timeit module and offers more features for reliable benchmarking snippets of Python code.
One key addition is the use of worker threads for executing benchmarks concurrently.
pyperf starts by spawning a first worker process (Run 1) only to calibrate the benchmark: compute the number of outer loops: 2^15 loops on the example. Then pyperf spawns 20 worker processes (Run 2 .. Run 21). Each worker starts by running the benchmark once to “warmup” the process, but this result is ignored in the final result. Then each worker runs the benchmark 3 times.
— Run a benchmark, Perf User Guide
It also offers other additional features, such as making available summary statistics on benchmark results, such as the distribution, mean and standard deviation by default.
It also reports warnings if the benchmark duration is too short and in turn likely unstable.
The pyperf project documentation includes a user guide and developer guide:
Now that we know about the pyperf project, let’s look at how we can use it.
Run loops using all CPUs, download your FREE book to learn how.
How to Use pyperf
Using the pyperf library for benchmarking involves three steps, they are:
- Install pyperf
- Run benchmarks
- Analyze results
Let’s take a closer look at each in turn.
1. Install pyperf
The first step is to install the pyperf library.
This can be achieved using your favorite Python package manager, such as pip.
For example:
1 |
pip install pyperf |
2. Run benchmarks
The pyperf library provides both a Python API and a command line interface.
Like the timeit module on which it relies, it seems that the pyperf library’s preferred usage is via the command line interface.
We can benchmark snippets of Python code on the command line by executing the pyperf module via the “python -m” flag, then specify the timeit command followed by the code snippet to execute.
For example:
1 |
python -m pyperf timeit "..." |
The “pyperf timeit” command takes many of the same arguments as the Python timeit module command.
You can learn more here:
The pyperf timeit command offers a number of benefits over the timeit module in the standard library.
* It displays the average and the standard deviation
— pyperf commands
* It runs the benchmark in multiple processes
* By default, it skips the first value in each process to warmup the benchmark
* It does not disable the garbage collector
We can save benchmark results to a JSON file via the -o flag followed by a filename.
For example:
1 |
python -m pyperf timeit "..." -o filename.json |
This is helpful for analysis and comparison in the next section.
3. Analyze and Compare Results
The pyperf library includes a number of tools for analyzing and comparing benchmark results.
One example is the “pyperf stats” command.
It takes the name of a benchmark JSON file and then reports statistical details of the benchmark results.
* Mean and standard deviation.
— pyperf commands
* Median and median absolute deviation (MAD).
* Percentiles.
* Outliers: number of values out of the range
For example:
1 |
python -m pyperf stats filename.json |
We can also compare benchmark results using the “pyperf compare_to” command.
This command takes arguments like the way to present the results followed by the filenames that contain the raw benchmark results.
Importantly, it uses statistical significance to report whether the difference between results is more likely meaningful or a statistical fluke.
pyperf determines whether two samples differ significantly using a Student’s two-sample, two-tailed t-test with alpha equals to 0.95.
— pyperf commands
For example:
1 |
python -m pyperf compare_to --table file1.json file2.json file3.json |
Now that we know how to use the pyperf library for benchmarking, let’s look at a worked example.
Example of Benchmarking with pyperf
We can use the pyperf command line interface to benchmark and compare different snippets for creating a list of squared integers.
The three methods we will compare are:
- Use the math.pow() function.
- Use the ** power operator.
- Use the * multiplication operator.
We will calculate and review the benchmark results for each method, then finally compare all methods directly.
Benchmark List of Squared Integers With math.pow()
Firstly, we can benchmark the creation of a list of one million squared integers using the math.pow() function.
1 2 |
... [math.pow(i,2) for i in range(1000000)] |
This requires importing the math module via the -s setup flag.
We will save the results into the “mathpow.json” function.
For example:
1 |
python -m pyperf timeit -s 'import math' '[math.pow(i,2) for i in range(1000000)]' -o mathpow.json |
Running the example benchmarks this snippet using the default number of loops and repetitions.
In this case, the mean execution time of the snippet is 113 milliseconds with a standard deviation of about 2 milliseconds.
Your results may vary.
1 2 |
..................... Mean +- std dev: 113 ms +- 2 ms |
We can review the distribution of results using the “pyperf stats” command.
For example:
1 |
python -m pyperf stats mathpow.json |
We can see the details of the distribution of benchmark results, seeing that we gathered a distribution of 60 samples or 3 iterations to run with 20 runs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Total duration: 10.0 sec Start date: 2023-10-06 08:13:19 End date: 2023-10-06 08:13:31 Raw value minimum: 110 ms Raw value maximum: 120 ms Number of calibration run: 1 Number of run with values: 20 Total number of run: 21 Number of warmup per run: 1 Number of value per run: 3 Loop iterations per value: 1 Total number of values: 60 Minimum: 110 ms Median +- MAD: 113 ms +- 1 ms Mean +- std dev: 113 ms +- 2 ms Maximum: 120 ms 0th percentile: 110 ms (-3% of the mean) -- minimum 5th percentile: 111 ms (-2% of the mean) 25th percentile: 112 ms (-1% of the mean) -- Q1 50th percentile: 113 ms (-0% of the mean) -- median 75th percentile: 114 ms (+1% of the mean) -- Q3 95th percentile: 118 ms (+4% of the mean) 100th percentile: 120 ms (+6% of the mean) -- maximum Number of outlier (out of 109 ms..118 ms): 4 |
Benchmark List of Squared Integers With i**2
We can perform a similar benchmark, in this case using the ** power operator.
1 2 |
... [i**2 for i in range(1000000)] |
Results will be saved to the powerop.json file.
The updated powerop.json command is listed below.
1 |
python -m pyperf timeit '[i**2 for i in range(1000000)]' -o powerop.json |
Running the command, we can see that the average execution time for the snippet was about 74.2 milliseconds with a standard deviation of about 1.5 milliseconds.
1 2 |
..................... Mean +- std dev: 74.2 ms +- 1.5 ms |
Next, we can review the statistics of the collected benchmark results using the “pyperf stats” command:
1 |
python -m pyperf stats powerop.json |
We can see that again 60 samples were collected and we can see the distribution of results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Total duration: 12.9 sec Start date: 2023-10-06 08:13:44 End date: 2023-10-06 08:13:59 Raw value minimum: 143 ms Raw value maximum: 157 ms Number of calibration run: 1 Number of run with values: 20 Total number of run: 21 Number of warmup per run: 1 Number of value per run: 3 Loop iterations per value: 2 Total number of values: 60 Minimum: 71.4 ms Median +- MAD: 74.5 ms +- 0.9 ms Mean +- std dev: 74.2 ms +- 1.5 ms Maximum: 78.3 ms 0th percentile: 71.4 ms (-4% of the mean) -- minimum 5th percentile: 72.0 ms (-3% of the mean) 25th percentile: 73.0 ms (-2% of the mean) -- Q1 50th percentile: 74.5 ms (+0% of the mean) -- median 75th percentile: 75.1 ms (+1% of the mean) -- Q3 95th percentile: 76.5 ms (+3% of the mean) 100th percentile: 78.3 ms (+5% of the mean) -- maximum Number of outlier (out of 69.8 ms..78.3 ms): 0 |
Benchmark List of Squared Integers With i*i
Finally, we can perform the final benchmark, in this case with the code updated to use the multiplication operator.
1 2 |
... [i*i for i in range(1000000)] |
The results are saved to the multiplyop.json file.
The updated “pyperf timeit” command is listed below:
1 |
python -m pyperf timeit '[i*i for i in range(1000000)]' -o multiplyop.json |
Running the command, we can see that the average duration to execute the snippet was about 61.8 milliseconds with a standard deviation of about 1.4 milliseconds.
1 2 |
..................... Mean +- std dev: 61.8 ms +- 1.4 ms |
Next, we can review the statistics of the collected benchmark results using the “pyperf stats” command:
1 |
python -m pyperf stats multiplyop.json |
We can see that 60 samples were collected and we can see the distribution and central tendency of the collected measure results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Total duration: 10.9 sec Start date: 2023-10-06 08:14:00 End date: 2023-10-06 08:14:13 Raw value minimum: 119 ms Raw value maximum: 131 ms Number of calibration run: 1 Number of run with values: 20 Total number of run: 21 Number of warmup per run: 1 Number of value per run: 3 Loop iterations per value: 2 Total number of values: 60 Minimum: 59.4 ms Median +- MAD: 61.7 ms +- 1.1 ms Mean +- std dev: 61.8 ms +- 1.4 ms Maximum: 65.6 ms 0th percentile: 59.4 ms (-4% of the mean) -- minimum 5th percentile: 59.8 ms (-3% of the mean) 25th percentile: 60.7 ms (-2% of the mean) -- Q1 50th percentile: 61.7 ms (-0% of the mean) -- median 75th percentile: 62.8 ms (+2% of the mean) -- Q3 95th percentile: 64.1 ms (+4% of the mean) 100th percentile: 65.6 ms (+6% of the mean) -- maximum Number of outlier (out of 57.6 ms..65.9 ms): 0 |
Comparison of Results
We have now collected benchmark results from 3 approaches that achieve the same result using different techniques.
We can manually gather the results and report them in a table, for example:
manual
1 2 3 4 5 |
Method | Mean Time (ms) | Std Time (ms) ---------|----------------|-------------- math.pow | 113.0 | 2.0 i**2 | 74.2 | 1.5 i*i | 61.8 | 1.4 |
We can also compare the results automatically using the “pyperf compare_to” command.
We will configure the command to show the results in a table via the –table flag and then specify the three result files.
For example:
1 |
python -m pyperf compare_to --table mathpow.json powerop.json multiplyop.json |
We can see that all 3 results are reported in a table, with one result per column.
We can see that the second two results are compared to the first, showing the speedup factor.
In this case, the multiplication operator was the fastest with a benchmark score that was about 1.83x faster than the math.pow() function.
1 2 3 4 5 |
+-----------+---------+-----------------------+-----------------------+ | Benchmark | mathpow | powerop | multiplyop | +===========+=========+=======================+=======================+ | timeit | 113 ms | 74.2 ms: 1.53x faster | 61.8 ms: 1.83x faster | +-----------+---------+-----------------------+-----------------------+ |
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Takeaways
You now know how to benchmark Python code using the pyperf open-source library.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any additional tips?
I’d love to hear about them!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Nicolas Hoizey on Unsplash
Do you have any questions?