Last Updated on October 5, 2023
You can benchmark Python code using the Python standard library.
Code can be benchmarked manually using the time module. The timeit module provides functions for automatically benchmarking code. The cProfile and profile modules can be used for benchmarking, although are not well-suited.
On a Linux or macOS workstation, we can use the time command to benchmark an entire Python script, without making any changes to it.
In this tutorial, you will discover how to benchmark Python code using the standard library.
Let’s get started.
Need Benchmark Python Code
Benchmarking Python code refers to comparing the performance of one program to variations of the program.
Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.
— Benchmarking, Wikipedia.
Typically, we make changes to the programs, such as adding concurrency, in order to improve the performance of the program on a given system.
Improving performance typically means reducing the run time of the program.
Therefore, when we benchmark programs in Python after adding concurrency, we typically are interested in recording how long a program takes to run.
It is critical to be systematic when benchmarking code.
The first step is to record how long an unmodified version of the program takes to run. This provides a baseline in performance to which all other versions of the program must be compared. If we are adding concurrency, then the unmodified version of the program will typically perform tasks sequentially, e.g. one-by-one.
We can then make modifications to the program, such as adding thread pools, process pools, or asyncio. The goal is to perform tasks concurrently (out of order), even in parallel (simultaneously). The performance of the program can be benchmarked and compared to the performance of the unmodified version.
The performance of modified versions of the program must have better performance than the unmodified version of the program. If they do not, they are not improvements and should not be adopted.
How can we benchmark the performance of programs in Python?
Run loops using all CPUs, download your FREE book to learn how.
How to Benchmark Python Code
The Python standard library provides the facility to benchmark code.
No third-party libraries are needed.
There are perhaps 3 main ways to benchmark Python code using the standard library, they are:
- Benchmark using the “time” module.
- Benchmark using the “timeit” module.
- Benchmark using the “cProfile” and “profile” modules.
Additionally, must POSIX workstations (e.g. Linux, macOS) have access to the “time” command that can also be used to benchmark a Python program:
- Benchmark using the “time” command.
Note, there are many additional ways to benchmark Python code using third-party libraries. In this case, we are focused on how to benchmark code using the Python standard library running on a POSIX-based workstation.
Let’s take a closer look at each of these methods in turn.
How to Benchmark Using the “time” Module
We can benchmark Python code using the time module.
The time.perf_counter() function will return a value from a high-performance counter.
Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration.
— time — Time access and conversions
The difference between the two calls to the time.perf_counter() function can provide a high-precision estimate of the execution time of a block of code.
Unlike the time.time() function, the time.perf_counter() function is not subject to updates, such as daylight saving and synchronizing the system clock with a time server. This makes the time.perf_counter() function is a reliable approach to benchmarking Python code.
We can call the time.perf_counter() function at the beginning of the code we wish to benchmark, and again at the end of the code we wish to benchmark.
For example:
1 2 3 4 5 6 7 |
... # record start time time_start = time.perf_counter() # call benchmark code task() # record end time time_end = time.perf_counter() |
The difference between the start and end time is the total duration of the program in seconds.
For example:
1 2 3 4 5 |
... # calculate the duration time_duration = time_end - time_start # report the duration print(f'Took {time_duration:.3f} seconds') |
You can learn more about benchmarking Python code with the time.perf_counter() function in the tutorial:
It is typically a good idea to repeat a benchmark multiple times, e.g. 3, and report the average time taken.
The reason is that there may be other programs running on the system, or the Python interpreter may take slightly longer when running code for the first time (because modules may need to be found and loaded).
We can repeat a benchmarking process in a loop, sum the durations and divide the sum by the number of iterations in order to give the average duration.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
... # run 3 times and record the durations times = list() for _ in range(3): # record start time time_start = perf_counter() # call benchmark code task() # calculate the duration time_duration = perf_counter() - time_start # report the duration print(f'>took {time_duration:.3f} seconds') # store the duration times.append(time_duration) # report the average duration time_average = sum(times) / 3.0 print(f'Average time {time_average:.3f} seconds') |
This is perhaps the most common classical way to benchmark code in Python because it is simple and easy to understand.
How to Benchmark Using the “timeit” Module
A modern approach to benchmarking code in Python is via the timeit module.
It provides both functions to call in order to benchmark code, as well as a command line interface.
This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.
— timeit — Measure execution time of small code snippets
It does take a little bit of reading of the API to understand, especially for some use cases, which can put off newer Python developers.
Nevertheless, we can perform a simple benchmark of a custom function by calling the timeit.timeit() function and specifying the code or function to benchmark and the setup code required.
For example, if we have a custom function defined in the current file, we must import this function as setup code for the timeit.timeit() function to be able to “see” it.
The function then returns the duration of the code in seconds, which can be reported.
1 2 3 4 5 |
... # benchmark the task result = timeit('task()', setup='from __main__ import task', number=1) # report the result print(f'Took {result:.3f} seconds') |
The timeit.timeit() function takes a “number” argument that specifies the number of times to call the custom code.
We can use this to repeat the benchmark and calculate an average.
For example, we can specify “number” to be 3 to run the code 3 times, which will then return the overall duration of running the code 3 times. This return value can then be divided by the number of iterations, e.g. 3 to give the average expected run time.
For example:
1 2 3 4 5 6 7 |
... # benchmark the task result = timeit('task()', setup='from __main__ import task', number=3) # calculate the average average_result = result / 3 # report the result print(f'Average time: {average_result:.3f} seconds') |
This is the preferred approach to benchmark code in modern Python programs.
The reason is that the module attempts to make the benchmarking consistent by disabling language features, like garbage collection.
By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. The disadvantage is that GC may be an important component of the performance of the function being measured.
— timeit — Measure execution time of small code snippets
How to Benchmark Using the “cProfile” and “profile” Modules
Another way to benchmark code in Python is to use a profiler, specifically the cProfile or profile modules.
cProfile and profile provide deterministic profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the pstats module.
— The Python Profilers
Generally, profilers are not intended for benchmarking code. Instead, they “profile” code, which means they report detailed statistics on the performance of each line or each instruction in the code.
The addition of the instrumentation of the code means that the overall runtime of the code is slower, skewing the benchmark results.
The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is timeit for reasonably accurate results).
— The Python Profilers
Nevertheless, the skew to longer runtimes can be consistent and the results can be used to benchmark and provide insight into which parts of the code are slow.
Both the cProfile and profile modules offer the same API. The cProfile module is preferred but may not be available on all systems, in which case you can fall back to the profile module.
We can benchmark using the profiler by calling the cProfile.run() function and specify the code to run and evaluate.
This will then report profile information to standard output (stdout). Included in the output will be the overall duration which can be used for comparison.
For example:
1 2 3 |
... # run the task cProfile.run('task()') |
We can perform the same benchmark via the profile.run() function.
For example:
1 2 3 |
... # run the task profile.run('task()') |
How to Benchmark Using the “time” Command
Finally, if we are on a POSIX workstation, such as Linux or macOS, we can benchmark a Python program using the time command.
In computing, time is a command in Unix and Unix-like operating systems. It is used to determine the duration of execution of a particular command.
— time (Unix), Wikipedia
The “time” command runs a program, such as a Python script, and reports the duration in terms of how long the program took to execute.
This approach is recommended if you want to benchmark the entire program without modifying it anyway.
It returns 3 benchmark results, real-time, user time, and system time.
Real-time is the wall clock time, the duration of the program. The CPU time is calculated as the sum of user time and system time which specify how long the program spent in each mode. If the program is blocked by other programs running at the same time, the CPU time may be longer than the real-time.
For example:
1 |
time python program.py |
I generally don’t recommend this approach unless it is a last resort.
Now that we know how to benchmark code in Python, let’s look at some worked examples, starting with the time module.
Example of Benchmarking With time.perf_counter()
In this section, we will explore examples of benchmarking Python code using the time.perf_counter() function.
We will first benchmark the duration of a function once, then develop a version that benchmarks the function many times and reports the average.
Single Benchmark Result
We can develop an example that calculates a single benchmark result of a function using time.perf_counter().
In this example we will first record the start time, then call the target function. We will then record the end time, calculate the duration and report the result.
In this case, we will use a simple task that creates a list of squared values between 0 and 100 million (e.g. 100,000,000). This task should take about 6 seconds on modern hardware. This task is arbitrary, replace it with your own code to benchmark.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# SuperFastPython.com # example of benchmarking a python function using time.perf_counter() from time import perf_counter # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # record start time time_start = perf_counter() # run the task task() # calculate the duration time_duration = perf_counter() - time_start # report the duration print(f'Took {time_duration:.3f} seconds') |
Running the example first records the start time and then calls the target function.
The end time is recorded and the duration is calculated and reported.
In this case, we can see that the target function takes about 6.251 seconds.
This highlights how we can perform a single benchmark of Python code using the time.perf_counter() function.
The downside of this approach is that the benchmark result will vary each time the code is run.
1 |
Took 6.251 seconds |
Average Benchmark Result
We can benchmark target Python code multiple times and report the average duration.
This accounts for the natural variation in the execution of code on a system, due to Python start-up time and other programs running in the background at the same time.
This can be achieved by looping the above code multiple times, in this case, 3, recording each duration, then calculating and reporting the average duration.
We can also report each individual duration along the way so that we get some indication of progress.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# SuperFastPython.com # example of repeated benchmark a python function using time.perf_counter() from time import perf_counter # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # run 3 times and record the durations times = list() for _ in range(3): # record start time time_start = perf_counter() # run the task task() # calculate the duration time_duration = perf_counter() - time_start # report the duration print(f'>took {time_duration:.3f} seconds') # store the duration times.append(time_duration) # report the average duration time_average = sum(times) / 3.0 print(f'Average time {time_average:.3f} seconds') |
Running the example loops three times, benchmarking the target function each time.
The intermediate benchmark results are reported, showing the natural variability of executing the same code on the same system.
Finally, the average duration is calculated and reported.
In this case, the target function is benchmarked at about 6.158 seconds.
More repetitions of the loop will improve this estimate, although will quickly reach diminishing returns.
This highlights how we can calculate an average duration benchmark of a target function using the time.perf_counter() function.
1 2 3 4 |
>took 6.332 seconds >took 6.054 seconds >took 6.089 seconds Average time 6.158 seconds |
Next, let’s explore how we might benchmark a target function using the timeit module.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Example of Benchmarking With timeit
In this section, we will explore examples of benchmarking Python code using the timeit.timeit() function.
We will first benchmark the duration of a function once, then develop a version that benchmarks the function many times and reports the average.
Single Benchmark Result
We can develop an example that calculates a single benchmark result of a function using timeit.timeit().
In this example we will call the timeit() function and specify the target function to execute, in this case, task(), we will then specify an import statement for this task in the “setup” argument and configure the number of iterations via the “number” argument to 1.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# SuperFastPython.com # example of benchmarking a python function using timeit.timeit() from timeit import timeit # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # benchmark the task result = timeit('task()', setup='from __main__ import task', number=1) # report the result print(f'Took {result:.3f} seconds') |
Running the example calls the timeit() function for the target function.
The duration is returned and reported.
In this case, the benchmark is reported as 6.276.
This highlights how we can benchmark a target function using the timeit.timeit() function.
The downside of this approach is that the benchmark result will vary each time the code is run.
1 |
Took 6.276 seconds |
Average Benchmark Result
We can update the example to report the average duration of the target function.
This is as easy as changing the “number” argument to the number of times to call the function, e.g. 3.
The return value is the overall duration, that can be divided by the number of iterations to give the average duration.
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# SuperFastPython.com # example of repeated benchmark of a python function using timeit.timeit() from timeit import timeit # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # benchmark the task result = timeit('task()', setup='from __main__ import task', number=3) # calculate the average average_result = result / 3 # report the result print(Average time: {average_result:.3f} seconds') |
Running the example calls the timeit() function for the target function.
The duration is returned and divided by the number of iterations to give the average.
The average duration for the target function is then reported.
In this case, the target function takes about 6.099 seconds on average.
This highlights how we can benchmark and report the average duration for a target function using the timeit.timeit() function.
1 |
Average time: 6.099 seconds |
Next, let’s look at how we might benchmark Python code using the Python profiler.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Benchmarking With cProfile and profile
In this section, we will explore examples of benchmarking Python code using the Python profiler.
Profiling is a different activity from benchmarking, it is more an investigation.
Nevertheless, like a brute, we can use profiler information for benchmarking.
We will explore how to profile and benchmark a target function using the cProfile and the profile modules.
Benchmark with cProfile
We can profile a target function using the cProfile.run() function.
The function takes the name of the target function to profile and will report profile information to standard output.
The example below profiles our task() function developed above.
Note, the cProfile module is not available on all platforms. If not supported on your platform, you may get an error.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# SuperFastPython.com # example of benchmarking a function using cProfile import cProfile # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # run the task cProfile.run('task()') |
Running the example profiles the task() function using the cProfile module.
In this case, we can see that the overall duration was about 9.163.
This is much slower than the approximate 6 seconds measured in previous benchmarking approaches. However, it is probably somewhat consistent across different variations of the same program.
More or fewer function calls within variations of the function being benchmarked may influence the overall duration, given the instrumentation required.
Far more useful are the insights that the profiler can give as to the slow parts of a target function.
This highlights how we might benchmark code using the cProfile module.
1 2 3 4 5 6 7 8 9 10 |
5 function calls in 9.163 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 1.165 1.165 9.163 9.163 <string>:1(<module>) 1 0.000 0.000 7.998 7.998 benchmark3.py:6(task) 1 7.998 7.998 7.998 7.998 benchmark3.py:8(<listcomp>) 1 0.000 0.000 9.163 9.163 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} |
Benchmark with profile
We can profile a target function using the profile.run() function.
Identical to the cProfile.run() function, the profile.run() function takes the name of the target function to profile and will report profile information to standard output.
The complete example profiling our task() function is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# SuperFastPython.com # example of benchmarking a function using profile import profile # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # run the task profile.run('task()') |
Running the example profiles the task() function using the profile module.
In this case, we can see that the overall duration was about 9.335.
Again, this is much slower than the approximate 6 seconds measured in previous benchmarking approaches. However, it is probably somewhat consistent across different variations of the same program.
This highlights how we might benchmark code using the profile module.
1 2 3 4 5 6 7 8 9 10 11 12 |
6 function calls in 9.335 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 9.322 9.322 :0(exec) 1 0.013 0.013 0.013 0.013 :0(setprofile) 1 1.171 1.171 9.322 9.322 <string>:1(<module>) 1 0.000 0.000 8.151 8.151 benchmark3.1.py:6(task) 1 8.151 8.151 8.151 8.151 benchmark3.1.py:8(<listcomp>) 0 0.000 0.000 profile:0(profiler) 1 0.000 0.000 9.335 9.335 profile:0(task()) |
Next, let’s look at how we might benchmark Python code using the time command.
Example of Benchmarking With time Command
We can explore how to benchmark a Python script using the time Unix command.
In this case, we can develop our program normally, and not add any benchmarking code to it at all.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# SuperFastPython.com # example of task that can be benchmarked using the unix time command from time import time # function to benchmark def task(): # create a large list data = [i*i for i in range(100000000)] # protect the entry point if __name__ == '__main__': # run the task task() |
We can then save our script to a file, such as “benchmark_time.py” and execute it from the command line using the Python interpreter.
This can be prefixed with the time command, which will run the script and report the duration.
1 |
time python benchmark_time.py |
Running the example reports the duration both in terms of real-time (wall clock time) and CPU time (user + sys).
I recommend focusing on real-time.
In this case, we can see that the script took about 6.335 seconds to complete.
If we wanted, we could develop a shell script to repeat the benchmark and report the average result. This is out of the scope of this tutorial (for now).
This highlights how we can benchmark a Python script using the time Unix command.
1 2 3 |
real 0m6.335s user 0m5.169s sys 0m1.133s |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to benchmark Python code using the standard library.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by B Mat an gelo on Unsplash
Do you have any questions?