Benchmark Python with timeit

You can benchmark snippets of Python code using the timeit module in the standard library.

In this tutorial, you will discover how to benchmark Python code using the timeit module.

Let’s get started.

Need to Benchmark Python Code

Benchmarking Python code refers to comparing the performance of one program to variations of the program.

Benchmarking is the practice of comparing business processes and performance metrics to industry bests and best practices from other companies. Dimensions typically measured are quality, time and cost.
— Benchmarking, Wikipedia.

Typically, we make changes to the programs, such as adding concurrency, in order to improve the performance of the program on a given system.

Improving performance typically means reducing the run time of the program.

Therefore, when we benchmark programs in Python after adding concurrency, we typically are interested in recording how long a program takes to run.

It is critical to be systematic when benchmarking code.

The first step is to record how long an unmodified version of the program takes to run. This provides a baseline in performance to which all other versions of the program must be compared. If we are adding concurrency, then the unmodified version of the program will typically perform tasks sequentially, e.g. one-by-one.

We can then make modifications to the program, such as adding thread pools, process pools, or asyncio. The goal is to perform tasks concurrently (out of order), even in parallel (simultaneously). The performance of the program can be benchmarked and compared to the performance of the unmodified version.

The performance of modified versions of the program must have better performance than the unmodified version of the program. If they do not, they are not improvements and should not be adopted.

How can we benchmark the performance of programs in Python?

Run loops using all CPUs, download your FREE book to learn how.

What is timeit?

The timeit module is provided in the Python standard library.

It provides an easy way to benchmark single statements and snippets of Python code.

This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times.
— timeit — Measure execution time of small code snippets

timeit Has Two Interfaces

It provides two interfaces for benchmarking.

API interface.
Command-line interface.

The first is an API that can be used via the timeit.Timer object or timeit.timeit() and timeit.repeat() module functions.

The second is a command line interface.

Both are intended to benchmark single Python statements, although multiple lines and multiple statements can be benchmarked using the module.

timeit Encodes Best Practices

Importantly it encodes a number of best practices for benchmarking, including:

Timing code using time.perf_counter(), for high-precision.
Executing target code many times by default (many samples), to reduce statistical noise.
Disabling the Python garbage collector, to reduce the variance in the measurements.
Providing a controlled and well-defined scope for benchmarked code, to reduce unwanted side-effects.

Note By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable.
— timeit — Measure execution time of small code snippets

timeit Is For Snippets

The timeit module is intended to benchmark small amounts of code that run very fast.

Class for timing execution speed of small code snippets.
— timeit — Measure execution time of small code snippets

It is generally not intended for benchmarking entire programs, although it can.

The interface is designed to take a single statement of Python code.

It is also generally not intended for benchmarking slow code, e.g. that takes seconds, minutes, or longer to run, although it can.

The benchmarking uses a high-precision timer that reports process time and executes a given statement one million times by default to expose the runtime signal of very short-duration target code.

If larger sections of code need to be benchmarked or target code has a long duration, consider developing custom benchmarking code that makes use of time.time() or time.perf_counter(), or use the time Unix command.

You can learn more in the tutorial:

5 Ways to Measure Execution Time in Python

Next, let’s consider the mindset needed when using the timeit module.

Start Now: Free Python Benchmarking Crash Course

What is the timeit Mindset

Using the timeit module can be confusing for the first time to developers.

There are three main areas of confusion.

You must specify the scope required for the benchmark code.
Benchmark times are not wall-clock times.
Benchmarked code is executed many many times, e.g. thousands of times by default.

This is intentional, capturing best benchmarking best practices, but requires a mindset shift.

Specify Scope

The code to be benchmarked must be specified as a string.

Additionally, the scope required to execute the benchmark code must be specified.

This can be achieved either via a setup string that might define or assign required variables or by specifying “globals” (global variables) that include the state and definitions required to execute the benchmark code.

This is required because the benchmarking of code is isolated from the program. This is intentional as it limits unwanted side effects of the program on the benchmark code, potentially influencing the benchmark score.

Benchmark Timings

The benchmarking scores reported are in seconds.

Nevertheless, they are reported using an internal performance timer.

This is generally equivalent to wall clock time, e.g. the total time elapsed on the system while executing the benchmarking.

The module does not use time.time() to calculate execution time, by default. The reason is time.time() is unreliable for benchmarking, especially for short durations, as the system clock on which it is based may be updated (e.g. daylight savings, leap seconds, etc.).

Instead, a standardized high-performance timer is used by default via the time.perf_counter() function.

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide.
— time — Time access and conversions

You can learn more about the time.perf_counter() function in the tutorial:

Benchmark Python with time.perf_counter()

Repeated Benchmarks

Each benchmark is repeated many times by default, e.g. 1,000,000 times.

The reason is that executing a single Python statement may take a very small interval of time. This is both hard to measure and also strongly influenced by whatever else might be happening on the system at the same time.

Executing the benchmark code many times allows the execution time signal to rise and overwhelm any statistical noise and variance.

As long as other benchmark statements use the same number of repetitions, the resulting numbers can be compared relatively, but they cannot be used as absolute benchmark scores.

Next, let’s explore how we might use the timeit API.

How to Use the timeit API

The focus of the timeit API is the timeit.Timer class, which can be used simply via the timeit.timeit() and timeit.repeat() module functions.

Next, let’s take a look at these elements in turn.

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

How to Benchmark with the timeit API

Now that we know what timeit is, let’s explore how we can use the API to benchmark snippets of Python code.

We will look at three parts of the API, they are:

The timeit.timeit() function.
The timeit.repeat() function.
The timeit.Timer class.

How to Use timeit.timeit()

The timeit.timeit() benchmarks Python code and reports the duration in seconds.

Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions.
— timeit — Measure execution time of small code snippets

The timeit.timeit() takes the Python statement to be benchmarked as a string.

For example:

...

# benchmark a python statement

result = timeit.timeit('[i*i for i in range(1000)]')

Any Python code required to execute the benchmark code can be provided as a string to the “setup” argument.

This might include defining a variable.

The setup code is only executed once prior to the benchmark.

For example:

...

# benchmark a python statement with setup code

result = timeit.timeit('[i*i for i in range(total)]', setup='total=10000')

It might include importing the main module so that required functions are imported.

For example:

...

# benchmark a python statement with import in setup

result = timeit.timeit('task()', setup='from __main__ import task')

Alternatively, if we have defined code in our program that is required to execute the benchmark code, we can specify the “globals” argument for the namespace.

We can specify locals() or globals() which will include a namespace from our current program.

For example:

...

# benchmark a python statement with a namespace

result = timeit.timeit('task()', globals=globals())

Finally, we can specify the number of repetitions of the benchmark code via the “number” argument.

By default, this is set to one million, e.g. 1,000,000, although can be set to a smaller number if the benchmark code takes a long time to execute.

For example:

...

# benchmark a python statement with a smaller number

result = timeit.timeit('[i*i for i in range(1000)]', number=100)

The “number” argument should be set so that the overall duration is at least 0.2 or 0.5 seconds, perhaps even more than one second.

You can learn more about how to benchmark Python with the timeit.timeit() function in the tutorial:

Benchmark Python with timeit.timeit()

How to Use timeit.repeat()

The timeit.repeat() function will call the timeit.timeit() function many times, e.g. repeatedly.

This is a convenience function that calls the timeit() repeatedly, returning a list of results.
— timeit — Measure execution time of small code snippets

It returns a collection of benchmark results that can then be summarized, such as the minimum (fastest time).

The average (expected time) or the maximum (longest time) can be reported, but are not likely to be representative, as many factors can cause a benchmark to take longer than expected.

Note It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.
— timeit — Measure execution time of small code snippets

Like the timeit.timeit() function, the timeit.repeat() function takes the statement to be benchmarked, along with a “setup“, “number“, and “globals” argument.

For example:

...

# benchmark a python statement repeatedly

results = timeit.repeat('[i*i for i in range(1000)]')

The number of repetitions is specified via the “repeat” argument which is set to 5 by default.

For example:

...

# benchmark a python statement repeatedly

results = timeit.repeat('[i*i for i in range(1000)]', repeat=10)

How to Use timeit.Timer()

The timeit.Timer class can be used by first creating an instance and then either calling the timeit() or repeat() methods.

Class for timing execution speed of small code snippets.
— timeit — Measure execution time of small code snippets

The timeit.Timer class constructor takes the details of the code that is being benchmarked, including the statement, any “setup” and any “globals” namespace.

For example:

...

# create a timer

timer = timeit.Timer('[i*i for i in range(1000)]')

The code can be benchmarked using the timeit() method that takes the number of times the code is run, which defaults to 1,000,000.

For example:

...

# benchmark a python statement

result = timer.timeit(number=100)

The code can be repeatedly benchmarked by calling the repeat() method.

This method takes a “repeat” argument that specifies the number of repetitions, defaulting to 5. It also takes a “number” argument specifying the number of times the code is run each repetition.

For example:

...

# benchmark a python statement repeatedly

results = timer.repeat(repeat=3, number=100)

The Timer class also provides an autorange() that will call timeit() and automatically determine the number of times to run the code to ensure the overall duration is large enough to be meaningful.

This is a convenience function that calls timeit() repeatedly so that the total time >= 0.2 second, returning the eventual (number of loops, time taken for that number of loops). It calls timeit() with increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the time taken is at least 0.2 second.
— timeit — Measure execution time of small code snippets

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Tips for Using the timeit API

This section provides some tips when working with the timeit module.

Import main Module

We can benchmark a function defined in your Python file.

This can be achieved by importing it from the main module in the “setup” argument, and making it available to the benchmark code.

For example:

...

# benchmark a function defined in main

result = timeit.timeit('task()', setup='from __main__ import task')

Pass Globals

We can benchmark code that requires data or functions defined in your program.

This can be achieved by specifying the namespace via the “globals” argument, such as either locals() for the current local namespace or globals() for the current global namespace.

This will make the relevant scope available to the benchmark code, including any defined variables and functions.

For example:

...

# benchmark custom functions

result = timeit.timeit('task()', globals=globals())

Benchmark Multiple Expressions

Although the timeit module is intended to benchmark single statements, we can use it to benchmark large snippets composed of multiple statements.

This can be achieved by creating a compound statement on one line, separated by semicolons (;).

For example:

...

# benchmark multiple statements

timeit.timeit('[i*i for i in range(1000)];[i+i for i in range(1000)]')

Another approach is to put the target code into a function and benchmark a call to the function.

For example:

# task function

def task():

[i*i for i in range(1000)]

# benchmark custom functions

result = timeit.timeit('task()', globals=globals())

Another approach is to define a multi-line statement as a multi-line string, then provide the string as an argument.

Now that we know how to use the timeit module, let’s look at some worked examples.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Example of Benchmarking with timeit.timeit()

In this section, we will explore an example of a benchmark using the timeit.timeit() function.

In this example, we will benchmark creating a list of 1,000 squared numbers.

1 2	... [i*i for i in range(1000)]

We will execute the statement 100,00 times.

...

# benchmark the statement

time_duration = timeit('[i*i for i in range(1000)]', number=100000)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a statement with timeit.timeit()

from timeit import timeit

# benchmark the statement

time_duration = timeit('[i*i for i in range(1000)]', number=100000)

# report the duration

print(f'Took {time_duration} seconds')

Running the example, we can see that the benchmark took about 4.087 seconds to complete.

1	Took 4.086719651939347 seconds

This could be compared to other methods of creating a list of 1,000 squared numbers, such as using the math.pow() function with the exponent of 2.

For example:

1 2	... [pow(i,2) for i in range(1000)]

This requires that we import the math.pow statement in order to perform the benchmark.

We can achieve this via the “setup” argument.

For example:

...

# benchmark the statement

time_duration = timeit('[pow(i,2) for i in range(1000)]', setup='from math import pow', number=100000)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a statement with timeit.timeit()

from timeit import timeit

# benchmark the statement

time_duration = timeit('[pow(i,2) for i in range(1000)]', setup='from math import pow', number=100000)

# report the duration

print(f'Took {time_duration} seconds')

Running the example, we can see that it takes about 9.394 seconds.

Compared to the above approach, using math.pow() is about 5.307 seconds slower (in this case, when repeated one hundred thousand times, on my system).

Your results will differ, given differences in software and hardware.

1	Took 9.394182911841199 seconds

You can learn more about how to benchmark Python with the timeit.timeit() function in the tutorial:

Benchmark Python with timeit.timeit()

Next, let’s explore how we might use the timeit.repeat() function.

Example of Benchmarking with timeit.repeat()

In this section, we will explore an example of a benchmark using the timeit.repeat() function.

In this example, we will benchmark creating a list of 1,000 squared numbers.

1 2	... [i*i for i in range(1000)]

We will execute the statement 10,000 times and repeat the benchmark 3 times.

...

# benchmark the statement

results = repeat('[i*i for i in range(1000)]', repeat=3, number=10000)

We will then report all results and the minimum (fastest) benchmark time.

...

# report the durations

print(results)

# report the min duration

print(min(results))

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a statement with timeit.repeat()

from timeit import repeat

# benchmark the statement

results = repeat('[i*i for i in range(1000)]', repeat=3, number=10000)

# report the durations

print(results)

# report the min duration

print(min(results))

Running the example, we can see that the benchmark took about 0.4 seconds to complete each run.

The fastest time to complete was about 0.409 seconds.

1 2	[0.4205979760736227, 0.409935096045956, 0.4183474569581449] 0.409935096045956

This could be compared to other methods of creating a list of 1,000 squared numbers, such as using the math.pow() function with the exponent of 2.

For example:

1 2	... [pow(i,2) for i in range(1000)]

This requires that we import the math.pow statement in order to perform the benchmark.

We can achieve this via the “setup” argument.

For example:

...

# benchmark the statement

results = repeat('[pow(i, 2) for i in range(1000)]', setup='from math import pow', repeat=3, number=10000)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a statement with timeit.repeat()

from timeit import repeat

# benchmark the statement

results = repeat('[pow(i, 2) for i in range(1000)]', setup='from math import pow', repeat=3, number=10000)

# report the durations

print(results)

# report the min duration

print(min(results))

Running the example, we can see that the benchmark took about 0.9 seconds to complete each run.

The fastest time to complete was about 0.927 seconds.

The results show that it is about 0.517 seconds or 517 milliseconds slower to use the math.pow() function to square the list of 1,000 numbers (in this case, when repeated ten thousand times, on my system).

Your results will differ, given the differences in software and hardware.

1 2	[0.9272060238290578, 0.9502722688484937, 0.9380616340786219] 0.9272060238290578

Next, let’s explore how we might use the timeit.Timer class.

Example of Benchmarking with timeit.Timer

In this section, we will explore an example of a benchmark using the timeit.repeat() function.

In this example, we will benchmark creating a list of 1,000 squared numbers.

1 2	... [i*i for i in range(1000)]

This statement can be provided to the timeit.Timer class constructor.

...

# create a timer

timer = Timer('[i*i for i in range(1000)]')

We can then benchmark the function 100,000 times and report the result.

...

# benchmark the statement

time_duration = timer.timeit(number=100000)

# report the duration

print(f'Took {time_duration:.3f} seconds')

We can then choose to repeat the benchmark 3 times and report all results and the minimum result.

...

# benchmark the statement

results = timer.repeat(repeat=3, number=100000)

# report the durations

print(results)

# report the minimum result

print(min(results))

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of benchmarking a statement with timeit.Timer

from timeit import Timer

# create a timer

timer = Timer('[i*i for i in range(1000)]')

# benchmark the statement

time_duration = timer.timeit(number=100000)

# report the duration

print(f'Took {time_duration:.3f} seconds')

# benchmark the statement

results = timer.repeat(repeat=3, number=100000)

# report the durations

print(results)

# report the minimum result

print(min(results))

Running the example first reports the benchmark results of the list creation, repeated 100,000 times.

Next, the benchmark is repeated 3 times, with all results reported and the minimum (fastest) time highlighted.

This highlights that the timeit.Timer class provides a convenient way to benchmark the same snippet of code in different ways, if needed.

Took 4.233 seconds

[4.139555993955582, 4.124664062168449, 4.120071409037337]

4.120071409037337

How to Use the timeit Command Line Interface

The command line or command line interface is a way of interacting with the computer using text commands, as opposed to clicking around on a graphical interface with a mouse.

A Python module can be run as a command on the command line directly via the -m flag, followed by the module name.

-m mod : run library module as a script (terminates option list)

The timeit module can be run directly in this way, for example:

python -m timeit [-n N] [-r N] [-u U] [-s S] [-h] [statement …]

The flags must always come first, and the statement that is being benchmarked must always come last, otherwise, you will get an error.

The main command line flags (or switches) to the timeit module are as follows:

-n N or –number=N: how many times to execute ‘statement’
-r N or –repeat=N: how many times to repeat the timer (default 5)
-s S or –setup=S: statement to be executed once initially (default pass)
-u U or –unit=U: the units for the result, e.g. nsec, usec, msec, or sec.

Other flags are provided, such as the -p or –process to change the way that time is measured, -v or –verbose flag for verbose output, and the -h or –help flag for getting a list of all available flags.

If the -n flag is not provided, the timeit module will attempt to estimate the number of times to run the statement until a minimum time threshold is reached.

If -n is not given, a suitable number of loops is calculated by trying increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the total time is at least 0.2 seconds.
— timeit — Measure execution time of small code snippets

The units for the -u flag can be confusing, here a guide:

nsec: Nanoseconds (1000 nanoseconds = 1 microsecond)
usec: Microseconds (1000 microsecond = 1 milliseconds)
msec: Milliseconds (1000 milliseconds = 1 second)
sec: Seconds (60 seconds = 1 minute)

The result is a benchmark result with the format:

[n] loops, best of [r]: [time] [units] per loop

Where:

[n] is the number of times the statement was executed.
[r] is the number of repeats of n loops.
[time] average time to execute the statement from the fastest repetition.
[units] is the time units in which the result is reported.

This means if the statement is executed 1,000 times and is repeated 5 tunes then the statement is executed 5,000 times and the fastest time from the 5 repetitions is reported.

The reported time is an average from the best repetition:

time = duration of fastest repetition / number of executions

This means that the one repetition of 5 that was the fastest (best) was used and the total time of all runs was divided by the number of runs, which was 5,000, to give the expected or average runtime for the statement.

In the output, there are three fields. The loop count, which tells you how many times the statement body was run per timing loop repetition. The repetition count (‘best of 5’) which tells you how many times the timing loop was repeated, and finally the time the statement body took on average within the best repetition of the timing loop. That is, the time the fastest repetition took divided by the loop count.
— timeit — Measure execution time of small code snippets

The timeit command line interface cannot benchmark a Python script directly.

Instead, it is intended to benchmark Python statements that execute in a short duration of time.

You can learn more about how to use the timeit command line interface in the tutorial:

Benchmark timeit Command Line Interface

Example of Benchmarking with the timeit Command Line Interface

We can explore how to benchmark a standalone Python code snippet using the timeit command line interface.

In this case, we will benchmark a snippet that creates a list of 1,000 squared integers.

1	[i*i for i in range(1000)]

For example:

1	python -m timeit "[i*i for i in range(1000)]"

Running this command on the command line, we see output from the timeit module.

Your results may differ.

In this case, we can see that the statement was executed 10,000 times and this loop was repeated 5 times, so 50,000 runs of the code.

The estimated time was “40.8 usec”, where usec is a microsecond. This means on the best repetition the statement took about 40.8 microseconds on average to run.

1	10000 loops, best of 5: 40.8 usec per loop

Let’s try another standalone version of creating a list of squared numbers.

In this case, using the ** operator.

1	[i**2 for i in range(1000)]

We can benchmark this on the command line with timeit as follows:

1	python -m timeit "[i**2 for i in range(1000)]"

Running this command on the command line, we see output from the timeit module.

Your results may differ.

In this case, we can see that the statement was executed 5,000 times and this was repeated 5 times. This means that the statement was executed 25,000 times.

The estimated time was about 53.8 usec, that is the average run time for the statement on the best repetition was about 53.8 microseconds.

This highlights how we can use the timeit command line interface to benchmark standalone code.

1	5000 loops, best of 5: 53.8 usec per loop

Notice that the timeit module automatically chose the number of loops, differing between each benchmark. This may be a problem if we want a fair apples-to-apples comparison.

Takeaways

You now know how to benchmark Python code using the timeit module.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Creative Christians on Unsplash

Need to Benchmark Python Code

What is timeit?

timeit Has Two Interfaces

timeit Encodes Best Practices

timeit Is For Snippets

What is the timeit Mindset

Specify Scope

Benchmark Timings

Repeated Benchmarks

How to Benchmark with the timeit API

How to Use timeit.timeit()

How to Use timeit.repeat()

How to Use timeit.Timer()

Tips for Using the timeit API

Import main Module

Pass Globals

Benchmark Multiple Expressions

Example of Benchmarking with timeit.timeit()

Example of Benchmarking with timeit.repeat()

Example of Benchmarking with timeit.Timer

How to Use the timeit Command Line Interface

Example of Benchmarking with the timeit Command Line Interface

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Python Benchmarking Fast
(without the frustration)

Additional menu

Need to Benchmark Python Code

What is timeit?

timeit Has Two Interfaces

timeit Encodes Best Practices

timeit Is For Snippets

What is the timeit Mindset

Specify Scope

Benchmark Timings

Repeated Benchmarks

How to Benchmark with the timeit API

How to Use timeit.timeit()

How to Use timeit.repeat()

How to Use timeit.Timer()

Tips for Using the timeit API

Import __main__ Module

Pass Globals

Benchmark Multiple Expressions

Example of Benchmarking with timeit.timeit()

Example of Benchmarking with timeit.repeat()

Example of Benchmarking with timeit.Timer

How to Use the timeit Command Line Interface

Example of Benchmarking with the timeit Command Line Interface

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Python Benchmarking Fast (without the frustration)

Import main Module

Learn Python Benchmarking Fast
(without the frustration)