You can benchmark snippets of Python code using the timeit module in the standard library.
In this tutorial, you will discover how to benchmark Python code using the timeit module.
Let’s get started.
Need to Benchmark Python Code
Benchmarking Python code refers to comparing the performance of one program to variations of the program.
Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.
— Benchmarking, Wikipedia.
Typically, we make changes to the programs, such as adding concurrency, in order to improve the performance of the program on a given system.
Improving performance typically means reducing the run time of the program.
Therefore, when we benchmark programs in Python after adding concurrency, we typically are interested in recording how long a program takes to run.
It is critical to be systematic when benchmarking code.
The first step is to record how long an unmodified version of the program takes to run. This provides a baseline in performance to which all other versions of the program must be compared. If we are adding concurrency, then the unmodified version of the program will typically perform tasks sequentially, e.g. one-by-one.
We can then make modifications to the program, such as adding thread pools, process pools, or asyncio. The goal is to perform tasks concurrently (out of order), even in parallel (simultaneously). The performance of the program can be benchmarked and compared to the performance of the unmodified version.
The performance of modified versions of the program must have better performance than the unmodified version of the program. If they do not, they are not improvements and should not be adopted.
How can we benchmark the performance of programs in Python?
Run loops using all CPUs, download your FREE book to learn how.
What is timeit?
The timeit module is provided in the Python standard library.
It provides an easy way to benchmark single statements and snippets of Python code.
This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.
— timeit — Measure execution time of small code snippets
timeit Has Two Interfaces
It provides two interfaces for benchmarking.
- API interface.
- Command-line interface.
The first is an API that can be used via the timeit.Timer object or timeit.timeit() and timeit.repeat() module functions.
The second is a command line interface.
Both are intended to benchmark single Python statements, although multiple lines and multiple statements can be benchmarked using the module.
timeit Encodes Best Practices
Importantly it encodes a number of best practices for benchmarking, including:
- Timing code using time.perf_counter(), for high-precision.
- Executing target code many times by default (many samples), to reduce statistical noise.
- Disabling the Python garbage collector, to reduce the variance in the measurements.
- Providing a controlled and well-defined scope for benchmarked code, to reduce unwanted side-effects.
Note By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable.
— timeit — Measure execution time of small code snippets
timeit Is For Snippets
The timeit module is intended to benchmark small amounts of code that run very fast.
Class for timing execution speed of small code snippets.
— timeit — Measure execution time of small code snippets
It is generally not intended for benchmarking entire programs, although it can.
The interface is designed to take a single statement of Python code.
It is also generally not intended for benchmarking slow code, e.g. that takes seconds, minutes, or longer to run, although it can.
The benchmarking uses a high-precision timer that reports process time and executes a given statement one million times by default to expose the runtime signal of very short-duration target code.
If larger sections of code need to be benchmarked or target code has a long duration, consider developing custom benchmarking code that makes use of time.time() or time.perf_counter(), or use the time Unix command.
You can learn more in the tutorial:
Next, let’s consider the mindset needed when using the timeit module.
What is the timeit Mindset
Using the timeit module can be confusing for the first time to developers.
There are three main areas of confusion.
- You must specify the scope required for the benchmark code.
- Benchmark times are not wall-clock times.
- Benchmarked code is executed many many times, e.g. thousands of times by default.
This is intentional, capturing best benchmarking best practices, but requires a mindset shift.
Specify Scope
The code to be benchmarked must be specified as a string.
Additionally, the scope required to execute the benchmark code must be specified.
This can be achieved either via a setup string that might define or assign required variables or by specifying “globals” (global variables) that include the state and definitions required to execute the benchmark code.
This is required because the benchmarking of code is isolated from the program. This is intentional as it limits unwanted side effects of the program on the benchmark code, potentially influencing the benchmark score.
Benchmark Timings
The benchmarking scores reported are in seconds.
Nevertheless, they are reported using an internal performance timer.
This is generally equivalent to wall clock time, e.g. the total time elapsed on the system while executing the benchmarking.
The module does not use time.time() to calculate execution time, by default. The reason is time.time() is unreliable for benchmarking, especially for short durations, as the system clock on which it is based may be updated (e.g. daylight savings, leap seconds, etc.).
Instead, a standardized high-performance timer is used by default via the time.perf_counter() function.
Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.
— time — Time access and conversions
You can learn more about the time.perf_counter() function in the tutorial:
Repeated Benchmarks
Each benchmark is repeated many times by default, e.g. 1,000,000 times.
The reason is that executing a single Python statement may take a very small interval of time. This is both hard to measure and also strongly influenced by whatever else might be happening on the system at the same time.
Executing the benchmark code many times allows the execution time signal to rise and overwhelm any statistical noise and variance.
As long as other benchmark statements use the same number of repetitions, the resulting numbers can be compared relatively, but they cannot be used as absolute benchmark scores.
Next, let’s explore how we might use the timeit API.
How to Use the timeit API
The focus of the timeit API is the timeit.Timer class, which can be used simply via the timeit.timeit() and timeit.repeat() module functions.
Next, let’s take a look at these elements in turn.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
How to Benchmark with the timeit API
Now that we know what timeit is, let’s explore how we can use the API to benchmark snippets of Python code.
We will look at three parts of the API, they are:
- The timeit.timeit() function.
- The timeit.repeat() function.
- The timeit.Timer class.
How to Use timeit.timeit()
The timeit.timeit() benchmarks Python code and reports the duration in seconds.
Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions.
— timeit — Measure execution time of small code snippets
The timeit.timeit() takes the Python statement to be benchmarked as a string.
For example:
1 2 3 |
... # benchmark a python statement result = timeit.timeit('[i*i for i in range(1000)]') |
Any Python code required to execute the benchmark code can be provided as a string to the “setup” argument.
This might include defining a variable.
The setup code is only executed once prior to the benchmark.
For example:
1 2 3 |
... # benchmark a python statement with setup code result = timeit.timeit('[i*i for i in range(total)]', setup='total=10000') |
It might include importing the main module so that required functions are imported.
For example:
1 2 3 |
... # benchmark a python statement with import in setup result = timeit.timeit('task()', setup='from __main__ import task') |
Alternatively, if we have defined code in our program that is required to execute the benchmark code, we can specify the “globals” argument for the namespace.
We can specify locals() or globals() which will include a namespace from our current program.
For example:
1 2 3 |
... # benchmark a python statement with a namespace result = timeit.timeit('task()', globals=globals()) |
Finally, we can specify the number of repetitions of the benchmark code via the “number” argument.
By default, this is set to one million, e.g. 1,000,000, although can be set to a smaller number if the benchmark code takes a long time to execute.
For example:
1 2 3 |
... # benchmark a python statement with a smaller number result = timeit.timeit('[i*i for i in range(1000)]', number=100) |
The “number” argument should be set so that the overall duration is at least 0.2 or 0.5 seconds, perhaps even more than one second.
You can learn more about how to benchmark Python with the timeit.timeit() function in the tutorial:
How to Use timeit.repeat()
The timeit.repeat() function will call the timeit.timeit() function many times, e.g. repeatedly.
This is a convenience function that calls the timeit() repeatedly, returning a list of results.
— timeit — Measure execution time of small code snippets
It returns a collection of benchmark results that can then be summarized, such as the minimum (fastest time).
The average (expected time) or the maximum (longest time) can be reported, but are not likely to be representative, as many factors can cause a benchmark to take longer than expected.
Note It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.
— timeit — Measure execution time of small code snippets
Like the timeit.timeit() function, the timeit.repeat() function takes the statement to be benchmarked, along with a “setup“, “number“, and “globals” argument.
For example:
1 2 3 |
... # benchmark a python statement repeatedly results = timeit.repeat('[i*i for i in range(1000)]') |
The number of repetitions is specified via the “repeat” argument which is set to 5 by default.
For example:
1 2 3 |
... # benchmark a python statement repeatedly results = timeit.repeat('[i*i for i in range(1000)]', repeat=10) |
How to Use timeit.Timer()
The timeit.Timer class can be used by first creating an instance and then either calling the timeit() or repeat() methods.
Class for timing execution speed of small code snippets.
— timeit — Measure execution time of small code snippets
The timeit.Timer class constructor takes the details of the code that is being benchmarked, including the statement, any “setup” and any “globals” namespace.
For example:
1 2 3 |
... # create a timer timer = timeit.Timer('[i*i for i in range(1000)]') |
The code can be benchmarked using the timeit() method that takes the number of times the code is run, which defaults to 1,000,000.
For example:
1 2 3 |
... # benchmark a python statement result = timer.timeit(number=100) |
The code can be repeatedly benchmarked by calling the repeat() method.
This method takes a “repeat” argument that specifies the number of repetitions, defaulting to 5. It also takes a “number” argument specifying the number of times the code is run each repetition.
For example:
1 2 3 |
... # benchmark a python statement repeatedly results = timer.repeat(repeat=3, number=100) |
The Timer class also provides an autorange() that will call timeit() and automatically determine the number of times to run the code to ensure the overall duration is large enough to be meaningful.
This is a convenience function that calls timeit() repeatedly so that the total time >= 0.2 second, returning the eventual (number of loops, time taken for that number of loops). It calls timeit() with increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the time taken is at least 0.2 second.
— timeit — Measure execution time of small code snippets
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Tips for Using the timeit API
This section provides some tips when working with the timeit module.
Import __main__ Module
We can benchmark a function defined in your Python file.
This can be achieved by importing it from the main module in the “setup” argument, and making it available to the benchmark code.
For example:
1 2 3 |
... # benchmark a function defined in main result = timeit.timeit('task()', setup='from __main__ import task') |
Pass Globals
We can benchmark code that requires data or functions defined in your program.
This can be achieved by specifying the namespace via the “globals” argument, such as either locals() for the current local namespace or globals() for the current global namespace.
This will make the relevant scope available to the benchmark code, including any defined variables and functions.
For example:
1 2 3 |
... # benchmark custom functions result = timeit.timeit('task()', globals=globals()) |
Benchmark Multiple Expressions
Although the timeit module is intended to benchmark single statements, we can use it to benchmark large snippets composed of multiple statements.
This can be achieved by creating a compound statement on one line, separated by semicolons (;).
For example:
1 2 3 |
... # benchmark multiple statements timeit.timeit('[i*i for i in range(1000)];[i+i for i in range(1000)]') |
Another approach is to put the target code into a function and benchmark a call to the function.
For example:
1 2 3 4 5 6 7 8 |
# task function def task(): [i*i for i in range(1000)] [i*i for i in range(1000)] [i*i for i in range(1000)] # benchmark custom functions result = timeit.timeit('task()', globals=globals()) |
Another approach is to define a multi-line statement as a multi-line string, then provide the string as an argument.
Now that we know how to use the timeit module, let’s look at some worked examples.
Example of Benchmarking with timeit.timeit()
In this section, we will explore an example of a benchmark using the timeit.timeit() function.
In this example, we will benchmark creating a list of 1,000 squared numbers.
1 2 |
... [i*i for i in range(1000)] |
We will execute the statement 100,00 times.
1 2 3 |
... # benchmark the statement time_duration = timeit('[i*i for i in range(1000)]', number=100000) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 |
# SuperFastPython.com # example of benchmarking a statement with timeit.timeit() from timeit import timeit # benchmark the statement time_duration = timeit('[i*i for i in range(1000)]', number=100000) # report the duration print(f'Took {time_duration} seconds') |
Running the example, we can see that the benchmark took about 4.087 seconds to complete.
1 |
Took 4.086719651939347 seconds |
This could be compared to other methods of creating a list of 1,000 squared numbers, such as using the math.pow() function with the exponent of 2.
For example:
1 2 |
... [pow(i,2) for i in range(1000)] |
This requires that we import the math.pow statement in order to perform the benchmark.
We can achieve this via the “setup” argument.
For example:
1 2 3 |
... # benchmark the statement time_duration = timeit('[pow(i,2) for i in range(1000)]', setup='from math import pow', number=100000) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 |
# SuperFastPython.com # example of benchmarking a statement with timeit.timeit() from timeit import timeit # benchmark the statement time_duration = timeit('[pow(i,2) for i in range(1000)]', setup='from math import pow', number=100000) # report the duration print(f'Took {time_duration} seconds') |
Running the example, we can see that it takes about 9.394 seconds.
Compared to the above approach, using math.pow() is about 5.307 seconds slower (in this case, when repeated one hundred thousand times, on my system).
Your results will differ, given differences in software and hardware.
1 |
Took 9.394182911841199 seconds |
You can learn more about how to benchmark Python with the timeit.timeit() function in the tutorial:
Next, let’s explore how we might use the timeit.repeat() function.
Example of Benchmarking with timeit.repeat()
In this section, we will explore an example of a benchmark using the timeit.repeat() function.
In this example, we will benchmark creating a list of 1,000 squared numbers.
1 2 |
... [i*i for i in range(1000)] |
We will execute the statement 10,000 times and repeat the benchmark 3 times.
1 2 3 |
... # benchmark the statement results = repeat('[i*i for i in range(1000)]', repeat=3, number=10000) |
We will then report all results and the minimum (fastest) benchmark time.
1 2 3 4 5 |
... # report the durations print(results) # report the min duration print(min(results)) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 |
# SuperFastPython.com # example of benchmarking a statement with timeit.repeat() from timeit import repeat # benchmark the statement results = repeat('[i*i for i in range(1000)]', repeat=3, number=10000) # report the durations print(results) # report the min duration print(min(results)) |
Running the example, we can see that the benchmark took about 0.4 seconds to complete each run.
The fastest time to complete was about 0.409 seconds.
1 2 |
[0.4205979760736227, 0.409935096045956, 0.4183474569581449] 0.409935096045956 |
This could be compared to other methods of creating a list of 1,000 squared numbers, such as using the math.pow() function with the exponent of 2.
For example:
1 2 |
... [pow(i,2) for i in range(1000)] |
This requires that we import the math.pow statement in order to perform the benchmark.
We can achieve this via the “setup” argument.
For example:
1 2 3 |
... # benchmark the statement results = repeat('[pow(i, 2) for i in range(1000)]', setup='from math import pow', repeat=3, number=10000) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 |
# SuperFastPython.com # example of benchmarking a statement with timeit.repeat() from timeit import repeat # benchmark the statement results = repeat('[pow(i, 2) for i in range(1000)]', setup='from math import pow', repeat=3, number=10000) # report the durations print(results) # report the min duration print(min(results)) |
Running the example, we can see that the benchmark took about 0.9 seconds to complete each run.
The fastest time to complete was about 0.927 seconds.
The results show that it is about 0.517 seconds or 517 milliseconds slower to use the math.pow() function to square the list of 1,000 numbers (in this case, when repeated ten thousand times, on my system).
Your results will differ, given the differences in software and hardware.
1 2 |
[0.9272060238290578, 0.9502722688484937, 0.9380616340786219] 0.9272060238290578 |
Next, let’s explore how we might use the timeit.Timer class.
Example of Benchmarking with timeit.Timer
In this section, we will explore an example of a benchmark using the timeit.repeat() function.
In this example, we will benchmark creating a list of 1,000 squared numbers.
1 2 |
... [i*i for i in range(1000)] |
This statement can be provided to the timeit.Timer class constructor.
1 2 3 |
... # create a timer timer = Timer('[i*i for i in range(1000)]') |
We can then benchmark the function 100,000 times and report the result.
1 2 3 4 5 |
... # benchmark the statement time_duration = timer.timeit(number=100000) # report the duration print(f'Took {time_duration:.3f} seconds') |
We can then choose to repeat the benchmark 3 times and report all results and the minimum result.
1 2 3 4 5 6 7 |
... # benchmark the statement results = timer.repeat(repeat=3, number=100000) # report the durations print(results) # report the minimum result print(min(results)) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# SuperFastPython.com # example of benchmarking a statement with timeit.Timer from timeit import Timer # create a timer timer = Timer('[i*i for i in range(1000)]') # benchmark the statement time_duration = timer.timeit(number=100000) # report the duration print(f'Took {time_duration:.3f} seconds') # benchmark the statement results = timer.repeat(repeat=3, number=100000) # report the durations print(results) # report the minimum result print(min(results)) |
Running the example first reports the benchmark results of the list creation, repeated 100,000 times.
Next, the benchmark is repeated 3 times, with all results reported and the minimum (fastest) time highlighted.
This highlights that the timeit.Timer class provides a convenient way to benchmark the same snippet of code in different ways, if needed.
1 2 3 |
Took 4.233 seconds [4.139555993955582, 4.124664062168449, 4.120071409037337] 4.120071409037337 |
How to Use the timeit Command Line Interface
The command line or command line interface is a way of interacting with the computer using text commands, as opposed to clicking around on a graphical interface with a mouse.
A Python module can be run as a command on the command line directly via the -m flag, followed by the module name.
-m mod : run library module as a script (terminates option list)
The timeit module can be run directly in this way, for example:
python -m timeit [-n N] [-r N] [-u U] [-s S] [-h] [statement …]
The flags must always come first, and the statement that is being benchmarked must always come last, otherwise, you will get an error.
The main command line flags (or switches) to the timeit module are as follows:
- -n N or –number=N: how many times to execute ‘statement’
- -r N or –repeat=N: how many times to repeat the timer (default 5)
- -s S or –setup=S: statement to be executed once initially (default pass)
- -u U or –unit=U: the units for the result, e.g. nsec, usec, msec, or sec.
Other flags are provided, such as the -p or –process to change the way that time is measured, -v or –verbose flag for verbose output, and the -h or –help flag for getting a list of all available flags.
If the -n flag is not provided, the timeit module will attempt to estimate the number of times to run the statement until a minimum time threshold is reached.
If -n is not given, a suitable number of loops is calculated by trying increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the total time is at least 0.2 seconds.
— timeit — Measure execution time of small code snippets
The units for the -u flag can be confusing, here a guide:
- nsec: Nanoseconds (1000 nanoseconds = 1 microsecond)
- usec: Microseconds (1000 microsecond = 1 milliseconds)
- msec: Milliseconds (1000 milliseconds = 1 second)
- sec: Seconds (60 seconds = 1 minute)
The result is a benchmark result with the format:
- [n] loops, best of [r]: [time] [units] per loop
Where:
- [n] is the number of times the statement was executed.
- [r] is the number of repeats of n loops.
- [time] average time to execute the statement from the fastest repetition.
- [units] is the time units in which the result is reported.
This means if the statement is executed 1,000 times and is repeated 5 tunes then the statement is executed 5,000 times and the fastest time from the 5 repetitions is reported.
The reported time is an average from the best repetition:
- time = duration of fastest repetition / number of executions
This means that the one repetition of 5 that was the fastest (best) was used and the total time of all runs was divided by the number of runs, which was 5,000, to give the expected or average runtime for the statement.
In the output, there are three fields. The loop count, which tells you how many times the statement body was run per timing loop repetition. The repetition count (‘best of 5’) which tells you how many times the timing loop was repeated, and finally the time the statement body took on average within the best repetition of the timing loop. That is, the time the fastest repetition took divided by the loop count.
— timeit — Measure execution time of small code snippets
The timeit command line interface cannot benchmark a Python script directly.
Instead, it is intended to benchmark Python statements that execute in a short duration of time.
You can learn more about how to use the timeit command line interface in the tutorial:
Example of Benchmarking with the timeit Command Line Interface
We can explore how to benchmark a standalone Python code snippet using the timeit command line interface.
In this case, we will benchmark a snippet that creates a list of 1,000 squared integers.
1 |
[i*i for i in range(1000)] |
For example:
1 |
python -m timeit "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the statement was executed 10,000 times and this loop was repeated 5 times, so 50,000 runs of the code.
The estimated time was “40.8 usec”, where usec is a microsecond. This means on the best repetition the statement took about 40.8 microseconds on average to run.
1 |
10000 loops, best of 5: 40.8 usec per loop |
Let’s try another standalone version of creating a list of squared numbers.
In this case, using the ** operator.
1 |
[i**2 for i in range(1000)] |
We can benchmark this on the command line with timeit as follows:
1 |
python -m timeit "[i**2 for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the statement was executed 5,000 times and this was repeated 5 times. This means that the statement was executed 25,000 times.
The estimated time was about 53.8 usec, that is the average run time for the statement on the best repetition was about 53.8 microseconds.
This highlights how we can use the timeit command line interface to benchmark standalone code.
1 |
5000 loops, best of 5: 53.8 usec per loop |
Notice that the timeit module automatically chose the number of loops, differing between each benchmark. This may be a problem if we want a fair apples-to-apples comparison.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to benchmark Python code using the timeit module.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Creative Christians on Unsplash
Do you have any questions?