Tips When Presenting Benchmark Results

You can carefully choose the level of precision and units of measure when presenting benchmark results.

These are the two main areas when presenting benchmark results that can introduce confusion and unnecessary cognitive load when attempting to interpret, analyze, and compare results.

Getting precision and units of measure correct will go a long way to ensuring execution time benchmark results are presented well.

In this tutorial, you will discover helpful tips to consider when presenting execution time benchmark results.

Let’s get started.

Table of Contents

Need to Present Benchmark Results

Recording benchmark results is the first step.

Typically, the second step involves presenting the results before a decision can be made to explore changes to the system.

We may need to report benchmark results to many people such as:

A team lead or manager.
Peer developers in the same team.
Project stakeholders.

Presenting raw results can be a problem.

This is typically for two main reasons:

The precision of the benchmark results is often high, leading to confusion.
The units of measure may be missing or limited to the default of seconds, which may not be appropriate.

We can focus on considerations when presenting results in these two areas, namely precision of measure and units of measure.

Let’s take a close look at each in turn.

Run loops using all CPUs, download your FREE book to learn how.

Tips for Benchmark Measure Precision

The precision of the result refers to the number of decimal places used to present results.

The default for benchmark results will be full double floating point precision, which is 16 decimal places on most platforms.

This can be confusing to managers, fellow developers, and stakeholders alike.

Below are some tips with regard to measurement precision when presenting results.

Tip 01: Don’t Show Too Much Precision

When presenting a measure, don’t include too much precision.

Limit the precision to the main point, such as the level of precision that highlights the main difference between measures.

Precision can be limited in many ways, such as by truncation and rounding.

Precision can also be adjusted by changing the unit of measure so that the main difference between measures appears before the decimal point.

Tip 02: Don’t Show Too Little Precision

A danger when limiting precision is limiting it too much.

We must give some indication that additional precision is available, meaning we probably should not round results to integer values, at least not without good reason.

A balance must be struck between clearly showing the main focus of the measure, e.g. the difference between different measure values, and the fact that additional precision is available but was limited for presentation reasons.

Tip 03: Be Consistent

We may present many measures within one report and across reports.

Ensure that the presentation of results is consistent. That all measures are reported using the same level of precision.

This consistency will allow values to be compared directly without additional cognitive load.

This may include zero padding the precision to ensure that numbers are right-aligned in a table or column.

Tip 04: Truncate Over Round

Rounding is an algorithm that involves replacing a number with an approximated number.

Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation.
— Rounding, Wikipedia.

There are many algorithms available and variations such as taking the floor and the ceiling.

For example, Python provides the built-in round() function and the math.floor() and math.ceil() functions for rounding floating point values to integers.

Rounding can lead to surprising results as the algorithm propagates the replacement of digits in a right-to-left order.

Therefore, it is generally preferred to use truncation of additional precision.

This is the direct deletion of the additional precision (e.g. in a report) or the presentation to a limited level of precision (e.g. in formatted output).

Regardless of the method chosen, once chosen, the same method must be used in all cases for consistency.

Tip 05: Avoid Scientific Notation

Scientific notation refers to number representation that is more compact than a decimal number with full precision.

Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form, since to do so would require writing out an inconveniently long string of digits.
— Scientific notation, Wikipedia.

A typical approach is to represent a decimal number with a base (b), an exponent (n), and a multiple (m).

For example:

m * b^n

The base could be e, a shorthand for “times ten raised to the power of some exponent“. For example:

m * e n

The exponent may be positive for large numbers or negative for very small numbers.

The measure adds the detail to the number, the specific details of the number.

Scientific notation is helpful when programming, but not helpful when presenting results.

This is because few people understand it well, at least at the time it needs to be understood in a report, and the conversion from the notation to a comparable number adds additional connotative load.

Always use a decimal notation when presenting results.

Start Now: Free Python Benchmarking Crash Course

Tips for Benchmark Measure Units

Units of measure refer to what the measure represents.

The default measure for almost all measurement functions is seconds, although nanosecond versions of most functions do exist.

Seconds may or may not be the best measure to use for a given set of benchmark results.

Tip 01: Know the Difference Between Measures

Recall that measurements of time have names at each order of magnitude (times 10).

There are many, but we don’t need to know them all.

Keep the following scale of measurements in mind:

Order | Name | Abbreviation

------|-------------|-----------

10^-9 | nanosecond | ns

10^-6 | microsecond | us

10^-3 | millisecond | ms

10^0 | second | sec

Note the jumps of 3 in the exponent, this is 3 zeros or 3 orders of magnitude, meaning we multiply or divide by 1,000 to go from one unit to the next.

For reference

One second has 1,000 milliseconds.
One millisecond has 1,000 microseconds.
One microsecond has 1,000 nanoseconds.

Use one of these 4 measures if under a minute.

Above one minute, use the regular units of time, such as minutes and hours.

Tip 02: Don’t Use a Measure That Is Too Low In Scale

Select a unit of measure that ensures the focus of the scores is close to the decimal point, e.g. just above or just below.

Probably choosing a measure that moves the focus to above the decimal point is the most helpful.

Choosing a unit of measure that is too low on the scale will mean that the differences of interest will be pushed far into the integers, such as thousands or millions.

This will look strange and add unnecessary cognitive load.

Tip 03: Don’t Use a Measure That Is High Low In Scale

We can be too aggressive and choose a unit of measure that is too high on the scale.

Overcorrecting in this way may mean that the interesting parts of the measure are past the decimal point, perhaps requiring the addition of more precision in the results.

This will begin raising a conflict with the tips in the previous section of showing too much precision in results.

Tip 04: Default to Seconds

Most Python functions for recording time operate at the level of seconds.

Everyone understands seconds and perhaps few people understand the difference between a nanosecond and a microsecond.

A good default is to stick with seconds and avoid conversions and new units of measure that require explaining.

This may mean choosing the code under study so that it is able to be completed in a reasonable number of seconds where difference can be captured in less than 3 orders of magnitude above and below the decimal point, e.g.

1	xxx.yyy seconds

Tip 05: Always Include The Units

When presenting a result, always include the units.

Use the full name or the common abbreviation.

For example:

1	Took 10.123 seconds

If it is a statistical quantity, like an average, clearly state this along with the units.

For example:

1	Took 10.123 seconds on average

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

Takeaways

You now know helpful tips to consider when presenting execution time benchmark results.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any additional tips?
I’d love to hear about them!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Alvin Balemesa on Unsplash

Tips When Presenting Benchmark Results

Need to Present Benchmark Results