You can benchmark snippets of Python code on the command line by using the timeit module.
In this tutorial, you will discover how to use the timeit command line interface to benchmark code in Python.
Let’s get started.
What is timeit
The timeit module is provided in the Python standard library.
It provides an easy way to benchmark single statements and snippets of Python code.
This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.
— timeit — Measure execution time of small code snippets
The timeit module provides two interfaces for benchmarking.
- API interface.
- Command-line interface.
The first is an API that can be used via the timeit.Timer object or timeit.timeit() and timeit.repeat() module functions.
The second is a command line interface.
Both are intended to benchmark single Python statements, although multiple lines and multiple statements can be benchmarked using the module.
You can learn more about how to benchmark Python code with the timeit module in the tutorial:
Run loops using all CPUs, download your FREE book to learn how.
What is a Command Line Interface
The command line or command line interface is a way of interacting with the computer using text commands, as opposed to clicking around on a graphical interface with a mouse.
A command-line interface (CLI) is a means of interacting with a device or computer program with commands from a user or client, and responses from the device or program, in the form of lines of text.
— Command-line interface, Wikipedia.
It is known by different names on different platforms, typically after the name of the program that provides the interface.
For example:
Python can be used on the command line directly via the “python” command.
This will open the Python interpreter.
We can also call the Python interpreter with flags. For example, we can execute a line of code directly using the Python interpreter via the -c flag:
1 |
python -c "print('hello world')" |
This will start the Python interpreter, execute the line of code, report the result, and close the interpreter.
1 |
hello world |
Now that we know about the command line, let’s look at the command line interface for the timeit module.
How to Use the timeit Command Line Interface
A Python module can be run as a command on the command line directly via the -m flag, followed by the module name.
-m mod : run library module as a script (terminates option list)
The timeit module can be run directly in this way, for example:
python -m timeit [-n N] [-r N] [-u U] [-s S] [-h] [statement …]
The flags must always come first, and the statement that is being benchmarked must always come last, otherwise, you will get an error.
The main command line flags (or switches) to the timeit module are as follows:
- -n N or –number=N: how many times to execute ‘statement’
- -r N or –repeat=N: how many times to repeat the timer (default 5)
- -s S or –setup=S: statement to be executed once initially (default pass)
- -u U or –unit=U: the units for the result, e.g. nsec, usec, msec, or sec.
Other flags are provided, such as the -p or –process to change the way that time is measured, -v or –verbose flag for verbose output, and the -h or –help flag for getting a list of all available flags.
If the -n flag is not provided, the timeit module will attempt to estimate the number of times to run the statement until a minimum time threshold is reached.
If -n is not given, a suitable number of loops is calculated by trying increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the total time is at least 0.2 seconds.
— timeit — Measure execution time of small code snippets
The units for the -u flag can be confusing, here a guide:
- nsec: Nanoseconds (1000 nanoseconds = 1 microsecond)
- usec: Microseconds (1000 microsecond = 1 milliseconds)
- msec: Milliseconds (1000 milliseconds = 1 second)
- sec: Seconds (60 seconds = 1 minute)
The result is a benchmark result with the format:
- [n] loops, best of [r]: [time] [units] per loop
Where:
- [n] is the number of times the statement was executed.
- [r] is the number of repeats of n loops.
- [time] average time to execute the statement from the fastest repetition.
- [units] is the time units in which the result is reported.
This means if the statement is executed 1,000 times and is repeated 5 tunes then the statement is executed 5,000 times and the fastest time from the 5 repetitions is reported.
The reported time is an average from the best repetition:
- time = duration of fastest repetition / number of executions
This means that the one repetition of 5 that was the fastest (best) was used and the total time of all runs was divided by the number of runs, which was 5,000, to give the expected or average runtime for the statement.
In the output, there are three fields. The loop count, which tells you how many times the statement body was run per timing loop repetition. The repetition count (‘best of 5’) which tells you how many times the timing loop was repeated, and finally the time the statement body took on average within the best repetition of the timing loop. That is, the time the fastest repetition took divided by the loop count.
— timeit — Measure execution time of small code snippets
The timeit command line interface cannot benchmark a Python script directly.
Instead, it is intended to benchmark Python statements that execute in a short duration of time.
Now that we know the basics of the timeit command line interface, let’s look at some worked examples.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Example of timeit Command Line Interface for Standalone Code
We can explore how to benchmark a standalone Python code snippet using the timeit command line interface.
In this case, we will benchmark a snippet that creates a list of 1,000 squared integers.
1 |
[i*i for i in range(1000)] |
For example:
1 |
python -m timeit "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the statement was executed 10,000 times and this loop was repeated 5 times, so 50,000 runs of the code.
The estimated time was “40.8 usec”, where usec is a microsecond. This means on the best repetition the statement took about 40.8 microseconds on average to run.
1 |
10000 loops, best of 5: 40.8 usec per loop |
Let’s try another standalone version of creating a list of squared numbers.
In this case, using the ** operator.
1 |
[i**2 for i in range(1000)] |
We can benchmark this on the command line with timeit as follows:
1 |
python -m timeit "[i**2 for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the statement was executed 5,000 times and this was repeated 5 times. This means that the statement was executed 25,000 times.
The estimated time was about 53.8 usec, that is the average run time for the statement on the best repetition was about 53.8 microseconds.
This highlights how we can use the timeit command line interface to benchmark standalone code.
1 |
5000 loops, best of 5: 53.8 usec per loop |
Notice that the timeit module automatically chose the number of loops, differing between each benchmark. This may be a problem if we want a fair apples-to-apples comparison
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of timeit Command Line Interface With Custom Loop Number
A problem with the previous example is that the number of loop iterations of the statement differed across different statements being benchmarked.
We can fix the number of repetitions to ensure that we have an apples-to-apples comparison.
This can be achieved via the -n flag, which we can set to 10,000, chosen arbitrarily.
For example:
1 |
-n 10000 |
We can then benchmark the first statement again with a fixed number of iterations.
1 |
python -m timeit -n 10000 "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the number of repetitions is fixed at 10,000 as we expect and the average time from the best repetition was about 40.9 microseconds on average.
1 |
10000 loops, best of 5: 40.9 usec per loop |
We can also repeat the benchmark of the ** approach with a fixed number of loop iterations.
1 |
python -m timeit -n 10000 "[i**2 for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the number of loop iterations is again fixed at 10,000 as we expect, and that the average time to execute the statement in the best repetition was about 53.8 microseconds on average.
This highlights how we can use the timeit command line interface to benchmark code with a custom number of iterations per loop
1 |
10000 loops, best of 5: 53.8 usec per loop |
One limitation of these results is that few people understand what microseconds are.
Example of timeit Command Line Interface With Custom Units
We can benchmark a Python statement with timeit and specify custom time units of measure.
Most people are familiar with seconds, so we can change the units from the default to seconds.
This can be achieved via the -u flag and providing the string “sec“.
For example
1 |
-u sec |
We can then re-benchmark our squared list example.
1 |
python -m timeit -u sec "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that the result is reported in seconds, but it is squashed and presented using scientific notation.
1 |
5000 loops, best of 5: 4.11e-05 sec per loop |
We can convert it back to decimal notation as follows:
- decimal = 4.11e-05
- decimal = 4.11 * 10^-5
- decimal = 4.11 * 0.00001
- decimal = 0.0000411
Converted to decimal, the result is 0.0000411 seconds.
We can see why timeit automatically chose the microseconds to report the results above.
This highlights how we can use the timeit command line interface to benchmark code with custom time units.
Example of timeit Command Line Interface With Custom Repetitions
Repeating a benchmark allows us to control for statistical noise.
Each tie we run a benchmark we will get a slightly different result. This is because of small differences in what the underlying operating system is doing at the same time as running the benchmark.
Repeating the benchmark many times means we can control for these background effects and choose the run with the minimum time, to give a fair idea of what is possible. From that run the average time to execute the statement is returned.
We can increase the number of repetitions of the benchmark loop in order to control more for the natural variation in the results. Increasing the number of repetitions from 5 to 10 or 30, or even 100 can help to ensure the results are more consistent from run to run.
This can be achieved via the -r flag. In this case, we will increase it to 30.
1 |
-r 100 |
We can then re-benchmark our list of squared numbers examples.
1 |
python -m timeit -r 100 "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, the average time to execute the statement was about 40.2 microseconds.
1 |
5000 loops, best of 100: 40.4 usec per loop |
Next, let’s repeat the same test again and see if we get much variance in the result
1 |
python -m timeit -r 100 "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
Again we see the same result of about 40.4 microseconds.
1 |
5000 loops, best of 100: 40.4 usec per loop |
Now, we can decrease the number of repetitions to 3 and we should see a greater variance between two runs of the same benchmark.
1 |
python -m timeit -r 3 "[i*i for i in range(1000)]" |
The first run gives an average time of 40.7 microseconds
1 |
5000 loops, best of 3: 40.7 usec per loop |
We can run it again:
1 |
python -m timeit -r 3 "[i*i for i in range(1000)]" |
The second run gives an average time of 40.8 microseconds
1 |
5000 loops, best of 3: 40.8 usec per loop |
At least in this case, we see that increasing repetitions of the test has the effect of producing more stable and consistent results across independent benchmarks on the same machine.
Example of timeit Command Line Interface With Verbose Output
By default, the timeit command line interface will repeat the looped execution of the provided statement 5 times.
We can see the total duration of each of these repetitions using the -v verbose flag.
For example:
1 |
-v |
We can apply this to our benchmark of calculating a list of squared numbers.
1 |
python -m timeit -v "[i*i for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
We did not specify the number of loops, therefore the timeit module automatically increased the number of loop iterations until the overall duration of one repetition was above 0.2 seconds.
We can see that it automatically chooses 5,000 iterations per loop and repeats the loop 5 times.
We can see the total times for each repetition, with a fast raw time of about 204 milliseconds.
If we divide this by the number of iterations, 5,000, this gives 0.0408 milliseconds. Converted to microseconds by multiplying 0.0408 by 1,000, this gives about 40.8 microseconds that we see reported at the bottom of the output.
This highlights how we can use the timeit command line interface to benchmark code with verbose output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
1 loop -> 5.36e-05 secs 2 loops -> 8.32e-05 secs 5 loops -> 0.000205 secs 10 loops -> 0.000407 secs 20 loops -> 0.000843 secs 50 loops -> 0.00211 secs 100 loops -> 0.00405 secs 200 loops -> 0.00822 secs 500 loops -> 0.0207 secs 1000 loops -> 0.0413 secs 2000 loops -> 0.0814 secs 5000 loops -> 0.202 secs raw times: 204 msec, 207 msec, 207 msec, 208 msec, 204 msec 5000 loops, best of 5: 40.8 usec per loop |
Example of timeit Command Line Interface With Setup
We can benchmark Python statements that require some setup, such as importing a function.
This can be achieved via the -s flag and specifying the setup statement or statements.
We can explore an alternate way to calculate a list of squared numbers using the math.pow() function.
This requires that we import the pow() function from the math module.
For example:
1 |
-s "from math import pow" |
We can benchmark this method using timeit as follows:
1 |
python -m timeit -s "from math import pow" "[pow(i,2) for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
We can see that the statement was executed 5,000 times automatically with 5 loops. The average time from the best repetition was 92.6 microseconds.
This highlights how we can use the timeit command line interface to benchmark code that requires setup.
1 |
5000 loops, best of 5: 92.6 usec per loop |
Example of timeit Command Line Interface With Custom Function
We can use the timeit command line interface to benchmark a custom function.
Firstly, the custom function must be saved to a Python file. It can then be imported via the -s setup flag.
In this example, we will define a custom function for calculating squared numbers called square().
1 2 3 |
# define a custom function for squaring numbers def square(value): return value * value |
We will save this in the current directly with the filename square.py
The square() function could then be imported into Python via:
from square import square
We can add this import statement to the setup of our timeit benchmark, for example:
1 |
-s "from square import square" |
And then call the square() function when creating our list of 1,000 numbers
1 |
[square(i) for i in range(1000)] |
Tying this together, the complete example of benchmarking a custom function is listed below
1 |
python -m timeit -s "from square import square" "[square(i) for i in range(1000)]" |
Running this command on the command line, we see output from the timeit module.
Your results may differ.
In this case, we can see that timeit automatically chooses 5,000 loop iterations per repetition. The average time to execute the statement to construct the list with the custom function was 70.5 microseconds.
This highlights how we can use the timeit command line interface to benchmark a custom function.
1 |
5000 loops, best of 5: 70.5 usec per loop |
Example of Getting Help For the timeit Command Line Interface
It is easy to forget the flags for the timeit module on the command line.
We can get help by using the -h flag:
1 |
-h |
For example:
1 |
python -m timeit -h |
This reports a ton of helpful usage information for the timeit command line interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
Tool for measuring execution time of small code snippets. This module avoids a number of common traps for measuring execution times. See also Tim Peters' introduction to the Algorithms chapter in the Python Cookbook, published by O'Reilly. Library usage: see the Timer class. Command line usage: python timeit.py [-n N] [-r N] [-s S] [-p] [-h] [--] [statement] Options: -n/--number N: how many times to execute 'statement' (default: see below) -r/--repeat N: how many times to repeat the timer (default 5) -s/--setup S: statement to be executed once initially (default 'pass'). Execution time of this setup statement is NOT timed. -p/--process: use time.process_time() (default is time.perf_counter()) -v/--verbose: print raw timing results; repeat for more digits precision -u/--unit: set the output time unit (nsec, usec, msec, or sec) -h/--help: print this usage message and exit --: separate options from statement, use when statement starts with - statement: statement to be timed (default 'pass') A multi-line statement may be given by specifying each line as a separate argument; indented lines are possible by enclosing an argument in quotes and using leading spaces. Multiple -s options are treated similarly. If -n is not given, a suitable number of loops is calculated by trying increasing numbers from the sequence 1, 2, 5, 10, 20, 50, ... until the total time is at least 0.2 seconds. Note: there is a certain baseline overhead associated with executing a pass statement. It differs between versions. The code here doesn't try to hide it, but you should be aware of it. The baseline overhead can be measured by invoking the program without arguments. Classes: Timer Functions: timeit(string, string) -> float repeat(string, string) -> list default_timer() -> float |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to use the timeit command line interface to benchmark code in Python.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Do you have any questions?