4 Ways to Benchmark Python Code

Last Updated on October 5, 2023

You can benchmark Python code using the Python standard library.

Code can be benchmarked manually using the time module. The timeit module provides functions for automatically benchmarking code. The cProfile and profile modules can be used for benchmarking, although are not well-suited.

On a Linux or macOS workstation, we can use the time command to benchmark an entire Python script, without making any changes to it.

In this tutorial, you will discover how to benchmark Python code using the standard library.

Let’s get started.

Table of Contents

Need Benchmark Python Code

Benchmarking Python code refers to comparing the performance of one program to variations of the program.

Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.
— Benchmarking, Wikipedia.

Typically, we make changes to the programs, such as adding concurrency, in order to improve the performance of the program on a given system.

Improving performance typically means reducing the run time of the program.

Therefore, when we benchmark programs in Python after adding concurrency, we typically are interested in recording how long a program takes to run.

It is critical to be systematic when benchmarking code.

The first step is to record how long an unmodified version of the program takes to run. This provides a baseline in performance to which all other versions of the program must be compared. If we are adding concurrency, then the unmodified version of the program will typically perform tasks sequentially, e.g. one-by-one.

We can then make modifications to the program, such as adding thread pools, process pools, or asyncio. The goal is to perform tasks concurrently (out of order), even in parallel (simultaneously). The performance of the program can be benchmarked and compared to the performance of the unmodified version.

The performance of modified versions of the program must have better performance than the unmodified version of the program. If they do not, they are not improvements and should not be adopted.

How can we benchmark the performance of programs in Python?

Run loops using all CPUs, download your FREE book to learn how.

How to Benchmark Python Code

The Python standard library provides the facility to benchmark code.

No third-party libraries are needed.

There are perhaps 3 main ways to benchmark Python code using the standard library, they are:

Benchmark using the “time” module.
Benchmark using the “timeit” module.
Benchmark using the “cProfile” and “profile” modules.

Additionally, must POSIX workstations (e.g. Linux, macOS) have access to the “time” command that can also be used to benchmark a Python program:

Benchmark using the “time” command.

Note, there are many additional ways to benchmark Python code using third-party libraries. In this case, we are focused on how to benchmark code using the Python standard library running on a POSIX-based workstation.

Let’s take a closer look at each of these methods in turn.

How to Benchmark Using the “time” Module

We can benchmark Python code using the time module.

The time.perf_counter() function will return a value from a high-performance counter.

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration.
— time — Time access and conversions

The difference between the two calls to the time.perf_counter() function can provide a high-precision estimate of the execution time of a block of code.

Unlike the time.time() function, the time.perf_counter() function is not subject to updates, such as daylight saving and synchronizing the system clock with a time server. This makes the time.perf_counter() function is a reliable approach to benchmarking Python code.

We can call the time.perf_counter() function at the beginning of the code we wish to benchmark, and again at the end of the code we wish to benchmark.

For example:

...

# record start time

time_start = time.perf_counter()

# call benchmark code

task()

# record end time

time_end = time.perf_counter()

The difference between the start and end time is the total duration of the program in seconds.

For example:

...

# calculate the duration

time_duration = time_end - time_start

# report the duration

print(f'Took {time_duration:.3f} seconds')

You can learn more about benchmarking Python code with the time.perf_counter() function in the tutorial:

Benchmark Python with time.perf_counter()

It is typically a good idea to repeat a benchmark multiple times, e.g. 3, and report the average time taken.

The reason is that there may be other programs running on the system, or the Python interpreter may take slightly longer when running code for the first time (because modules may need to be found and loaded).

We can repeat a benchmarking process in a loop, sum the durations and divide the sum by the number of iterations in order to give the average duration.

For example:

...

# run 3 times and record the durations

times = list()

for _ in range(3):

# record start time

time_start = perf_counter()

# call benchmark code

task()

# calculate the duration

time_duration = perf_counter() - time_start

# report the duration

print(f'>took {time_duration:.3f} seconds')

# store the duration

times.append(time_duration)

# report the average duration

time_average = sum(times) / 3.0

print(f'Average time {time_average:.3f} seconds')

This is perhaps the most common classical way to benchmark code in Python because it is simple and easy to understand.

How to Benchmark Using the “timeit” Module

A modern approach to benchmarking code in Python is via the timeit module.

It provides both functions to call in order to benchmark code, as well as a command line interface.

This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.
— timeit — Measure execution time of small code snippets

It does take a little bit of reading of the API to understand, especially for some use cases, which can put off newer Python developers.

Nevertheless, we can perform a simple benchmark of a custom function by calling the timeit.timeit() function and specifying the code or function to benchmark and the setup code required.

For example, if we have a custom function defined in the current file, we must import this function as setup code for the timeit.timeit() function to be able to “see” it.

The function then returns the duration of the code in seconds, which can be reported.

...

# benchmark the task

result = timeit('task()', setup='from __main__ import task', number=1)

# report the result

print(f'Took {result:.3f} seconds')

The timeit.timeit() function takes a “number” argument that specifies the number of times to call the custom code.

We can use this to repeat the benchmark and calculate an average.

For example, we can specify “number” to be 3 to run the code 3 times, which will then return the overall duration of running the code 3 times. This return value can then be divided by the number of iterations, e.g. 3 to give the average expected run time.

For example:

...

# benchmark the task

result = timeit('task()', setup='from __main__ import task', number=3)

# calculate the average

average_result = result / 3

# report the result

print(f'Average time: {average_result:.3f} seconds')

This is the preferred approach to benchmark code in modern Python programs.

The reason is that the module attempts to make the benchmarking consistent by disabling language features, like garbage collection.

By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. The disadvantage is that GC may be an important component of the performance of the function being measured.
— timeit — Measure execution time of small code snippets

How to Benchmark Using the “cProfile” and “profile” Modules

Another way to benchmark code in Python is to use a profiler, specifically the cProfile or profile modules.

cProfile and profile provide deterministic profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the pstats module.
— The Python Profilers

Generally, profilers are not intended for benchmarking code. Instead, they “profile” code, which means they report detailed statistics on the performance of each line or each instruction in the code.

The addition of the instrumentation of the code means that the overall runtime of the code is slower, skewing the benchmark results.

The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is timeit for reasonably accurate results).
— The Python Profilers

Nevertheless, the skew to longer runtimes can be consistent and the results can be used to benchmark and provide insight into which parts of the code are slow.

Both the cProfile and profile modules offer the same API. The cProfile module is preferred but may not be available on all systems, in which case you can fall back to the profile module.

We can benchmark using the profiler by calling the cProfile.run() function and specify the code to run and evaluate.

This will then report profile information to standard output (stdout). Included in the output will be the overall duration which can be used for comparison.

For example:

...

# run the task

cProfile.run('task()')

We can perform the same benchmark via the profile.run() function.

For example:

...

# run the task

profile.run('task()')

How to Benchmark Using the “time” Command

Finally, if we are on a POSIX workstation, such as Linux or macOS, we can benchmark a Python program using the time command.

In computing, time is a command in Unix and Unix-like operating systems. It is used to determine the duration of execution of a particular command.
— time (Unix), Wikipedia

The “time” command runs a program, such as a Python script, and reports the duration in terms of how long the program took to execute.

This approach is recommended if you want to benchmark the entire program without modifying it anyway.

It returns 3 benchmark results, real-time, user time, and system time.

Real-time is the wall clock time, the duration of the program. The CPU time is calculated as the sum of user time and system time which specify how long the program spent in each mode. If the program is blocked by other programs running at the same time, the CPU time may be longer than the real-time.

For example:

1	time python program.py

I generally don’t recommend this approach unless it is a last resort.

Now that we know how to benchmark code in Python, let’s look at some worked examples, starting with the time module.

Start Now: Free Python Benchmarking Crash Course

Example of Benchmarking With time.perf_counter()

In this section, we will explore examples of benchmarking Python code using the time.perf_counter() function.

We will first benchmark the duration of a function once, then develop a version that benchmarks the function many times and reports the average.

Single Benchmark Result

We can develop an example that calculates a single benchmark result of a function using time.perf_counter().

In this example we will first record the start time, then call the target function. We will then record the end time, calculate the duration and report the result.

In this case, we will use a simple task that creates a list of squared values between 0 and 100 million (e.g. 100,000,000). This task should take about 6 seconds on modern hardware. This task is arbitrary, replace it with your own code to benchmark.

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a python function using time.perf_counter()

from time import perf_counter

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# record start time

time_start = perf_counter()

# run the task

task()

# calculate the duration

time_duration = perf_counter() - time_start

# report the duration

print(f'Took {time_duration:.3f} seconds')

Running the example first records the start time and then calls the target function.

The end time is recorded and the duration is calculated and reported.

In this case, we can see that the target function takes about 6.251 seconds.

This highlights how we can perform a single benchmark of Python code using the time.perf_counter() function.

The downside of this approach is that the benchmark result will vary each time the code is run.

1	Took 6.251 seconds

Average Benchmark Result

We can benchmark target Python code multiple times and report the average duration.

This accounts for the natural variation in the execution of code on a system, due to Python start-up time and other programs running in the background at the same time.

This can be achieved by looping the above code multiple times, in this case, 3, recording each duration, then calculating and reporting the average duration.

We can also report each individual duration along the way so that we get some indication of progress.

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of repeated benchmark a python function using time.perf_counter()

from time import perf_counter

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# run 3 times and record the durations

times = list()

for _ in range(3):

# record start time

time_start = perf_counter()

# run the task

task()

# calculate the duration

time_duration = perf_counter() - time_start

# report the duration

print(f'>took {time_duration:.3f} seconds')

# store the duration

times.append(time_duration)

# report the average duration

time_average = sum(times) / 3.0

print(f'Average time {time_average:.3f} seconds')

Running the example loops three times, benchmarking the target function each time.

The intermediate benchmark results are reported, showing the natural variability of executing the same code on the same system.

Finally, the average duration is calculated and reported.

In this case, the target function is benchmarked at about 6.158 seconds.

More repetitions of the loop will improve this estimate, although will quickly reach diminishing returns.

This highlights how we can calculate an average duration benchmark of a target function using the time.perf_counter() function.

>took 6.332 seconds

>took 6.054 seconds

>took 6.089 seconds

Average time 6.158 seconds

Next, let’s explore how we might benchmark a target function using the timeit module.

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

Example of Benchmarking With timeit

In this section, we will explore examples of benchmarking Python code using the timeit.timeit() function.

We will first benchmark the duration of a function once, then develop a version that benchmarks the function many times and reports the average.

Single Benchmark Result

We can develop an example that calculates a single benchmark result of a function using timeit.timeit().

In this example we will call the timeit() function and specify the target function to execute, in this case, task(), we will then specify an import statement for this task in the “setup” argument and configure the number of iterations via the “number” argument to 1.

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a python function using timeit.timeit()

from timeit import timeit

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# benchmark the task

result = timeit('task()', setup='from __main__ import task', number=1)

# report the result

print(f'Took {result:.3f} seconds')

Running the example calls the timeit() function for the target function.

The duration is returned and reported.

In this case, the benchmark is reported as 6.276.

This highlights how we can benchmark a target function using the timeit.timeit() function.

The downside of this approach is that the benchmark result will vary each time the code is run.

1	Took 6.276 seconds

Average Benchmark Result

We can update the example to report the average duration of the target function.

This is as easy as changing the “number” argument to the number of times to call the function, e.g. 3.

The return value is the overall duration, that can be divided by the number of iterations to give the average duration.

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of repeated benchmark of a python function using timeit.timeit()

from timeit import timeit

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# benchmark the task

result = timeit('task()', setup='from __main__ import task', number=3)

# calculate the average

average_result = result / 3

# report the result

print(Average time: {average_result:.3f} seconds')

Running the example calls the timeit() function for the target function.

The duration is returned and divided by the number of iterations to give the average.

The average duration for the target function is then reported.

In this case, the target function takes about 6.099 seconds on average.

This highlights how we can benchmark and report the average duration for a target function using the timeit.timeit() function.

1	Average time: 6.099 seconds

Next, let’s look at how we might benchmark Python code using the Python profiler.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Example of Benchmarking With cProfile and profile

In this section, we will explore examples of benchmarking Python code using the Python profiler.

Profiling is a different activity from benchmarking, it is more an investigation.

Nevertheless, like a brute, we can use profiler information for benchmarking.

We will explore how to profile and benchmark a target function using the cProfile and the profile modules.

Benchmark with cProfile

We can profile a target function using the cProfile.run() function.

The function takes the name of the target function to profile and will report profile information to standard output.

The example below profiles our task() function developed above.

Note, the cProfile module is not available on all platforms. If not supported on your platform, you may get an error.

# SuperFastPython.com

# example of benchmarking a function using cProfile

import cProfile

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# run the task

cProfile.run('task()')

Running the example profiles the task() function using the cProfile module.

In this case, we can see that the overall duration was about 9.163.

This is much slower than the approximate 6 seconds measured in previous benchmarking approaches. However, it is probably somewhat consistent across different variations of the same program.

More or fewer function calls within variations of the function being benchmarked may influence the overall duration, given the instrumentation required.

Far more useful are the insights that the profiler can give as to the slow parts of a target function.

This highlights how we might benchmark code using the cProfile module.

5 function calls in 9.163 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)

1 1.165 1.165 9.163 9.163 <string>:1(<module>)

1 0.000 0.000 7.998 7.998 benchmark3.py:6(task)

1 7.998 7.998 7.998 7.998 benchmark3.py:8(<listcomp>)

1 0.000 0.000 9.163 9.163 {built-in method builtins.exec}

1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Benchmark with profile

We can profile a target function using the profile.run() function.

Identical to the cProfile.run() function, the profile.run() function takes the name of the target function to profile and will report profile information to standard output.

The complete example profiling our task() function is listed below.

# SuperFastPython.com

# example of benchmarking a function using profile

import profile

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# run the task

profile.run('task()')

Running the example profiles the task() function using the profile module.

In this case, we can see that the overall duration was about 9.335.

Again, this is much slower than the approximate 6 seconds measured in previous benchmarking approaches. However, it is probably somewhat consistent across different variations of the same program.

This highlights how we might benchmark code using the profile module.

6 function calls in 9.335 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.000 0.000 9.322 9.322 :0(exec)

1 0.013 0.013 0.013 0.013 :0(setprofile)

1 1.171 1.171 9.322 9.322 <string>:1(<module>)

1 0.000 0.000 8.151 8.151 benchmark3.1.py:6(task)

1 8.151 8.151 8.151 8.151 benchmark3.1.py:8(<listcomp>)

0 0.000 0.000 profile:0(profiler)

1 0.000 0.000 9.335 9.335 profile:0(task())

Next, let’s look at how we might benchmark Python code using the time command.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Example of Benchmarking With time Command

We can explore how to benchmark a Python script using the time Unix command.

In this case, we can develop our program normally, and not add any benchmarking code to it at all.

The complete example is listed below.

# SuperFastPython.com

# example of task that can be benchmarked using the unix time command

from time import time

# function to benchmark

def task():

# create a large list

data = [i*i for i in range(100000000)]

# protect the entry point

if __name__ == '__main__':

# run the task

task()

We can then save our script to a file, such as “benchmark_time.py” and execute it from the command line using the Python interpreter.

This can be prefixed with the time command, which will run the script and report the duration.

1	time python benchmark_time.py

Running the example reports the duration both in terms of real-time (wall clock time) and CPU time (user + sys).

I recommend focusing on real-time.

In this case, we can see that the script took about 6.335 seconds to complete.

If we wanted, we could develop a shell script to repeat the benchmark and report the average result. This is out of the scope of this tutorial (for now).

This highlights how we can benchmark a Python script using the time Unix command.

real 0m6.335s

user 0m5.169s

sys 0m1.133s

Takeaways

You now know how to benchmark Python code using the standard library.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by B Mat an gelo on Unsplash

4 Ways to Benchmark Python Code

Need Benchmark Python Code

How to Benchmark Python Code

How to Benchmark Using the “time” Module

How to Benchmark Using the “timeit” Module

How to Benchmark Using the “cProfile” and “profile” Modules

How to Benchmark Using the “time” Command

Example of Benchmarking With time.perf_counter()

Single Benchmark Result

Average Benchmark Result

Example of Benchmarking With timeit

Single Benchmark Result

Average Benchmark Result

Example of Benchmarking With cProfile and profile

Benchmark with cProfile

Benchmark with profile

Example of Benchmarking With time Command

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Python Benchmarking Fast
(without the frustration)

Additional menu

Need Benchmark Python Code

How to Benchmark Python Code

How to Benchmark Using the “time” Module

How to Benchmark Using the “timeit” Module

How to Benchmark Using the “cProfile” and “profile” Modules

How to Benchmark Using the “time” Command

Example of Benchmarking With time.perf_counter()

Single Benchmark Result

Average Benchmark Result

Example of Benchmarking With timeit

Single Benchmark Result

Average Benchmark Result

Example of Benchmarking With cProfile and profile

Benchmark with cProfile

Benchmark with profile

Example of Benchmarking With time Command

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Python Benchmarking Fast (without the frustration)

Learn Python Benchmarking Fast
(without the frustration)