Benchmark Fastest Way To Create NumPy Random Numbers

You can benchmark NumPy random number array functions and discover the fastest approaches to use in different circumstances.

Generally the modern numpy.random.Generator NumPy random number generator should be used over the legacy numpy.random.RandomState random number generator as it is significantly faster.

When generating random floats, using a type of float32 is faster than float64. When generating random integers, using int16 and int32 can be faster than other types, and perhaps faster gain if unsigned. When generating random booleans, generating 0 and 1 integers and storing them in an array with the type numpy.bool_ is the fastest.

In this tutorial, you will discover how to benchmark and discover the fastest way to generate NumPy arrays of random values.

Let’s get started.

Table of Contents

Need Fast NumPy Random Numbers

Random numbers are a big part of many NumPy programs.

We need randomness in many programs such as simulations, optimization algorithms, learning algorithms, and more.

Generating random values is typically slow given that the pseudorandom number generator must use a complex mathematical operation. Therefore, we are interested in ways of generating the randomness we require in the fastest way possible.

There are two main randomness APIs in NumPy, they are:

Legacy NumPy random API, e.g. numpy.random.RandomState.
Modern NumPy random API, e.g. numpy.random.Generator.

Which one is faster?

Further, there are several functions that we can use to generate numbers, which one is the fastest?

We can explore this question from a few angles.

Firstly, we will explore how to create arrays of random floating point values using functions such as:

We will then explore how to create arrays of random integer values, with functions such as:

We will then use a mixture of these functions to create arrays of random boolean values, a capability not provided by the NumPy random APIs.

Run loops using all CPUs, download your FREE book to learn how.

Benchmark NumPy Random Numbers

We can explore the question of how fast the different approaches to creating NumPy random numbers are using benchmarking.

In this case, we will use an approach to creating random number arrays of a modest fixed size, then repeat this process many times to give an estimated time. We can then compare the times to see the relative performance of the approaches tested.

You can use this approach to benchmark your own favorite NumPy array operations.

If you use or extend the NumPy benchmarking approach used in this tutorial, let me know in the comments below. I’d love to see what you come up with.

We could use the time.perf_counter() function directly and develop a helper function to perform the benchmarking and report results.

You can learn more about benchmarking with the time.perf_counter() function in the tutorial:

Benchmark Python with time.perf_counter()

Instead, in this case, we will use the timeit API, specifically the timeit.timeit() function and specify the string of array code to run and a fixed number of times to run it.

We will also provide the globals argument for any constants defined in our benchmark code, such as array size or shape.

For example:

...

# benchmark a thing

result = timeit.timeit('...', globals=globals(), number=N)

print(f'approach {result:.3f} seconds')

You can learn more about benchmarking with the timeit.timeit() function in the tutorial:

Benchmark Python with timeit.timeit()

The number of runs in each benchmark was tuned to ensure that each snippet was executed in more than one second and less than about 10 seconds.

Let’s get started.

Start Now: Free Python Benchmarking Crash Course

Fastest Way to Create 1D NumPy Array of Random Floats

We can explore the fastest way to create a modestly sized NumPy array of random floating point values in [0,1).

In this case, we will create a fixed size 1d array with one million elements (1,000,000) of random floats with the default data type, float64 on most platforms. Each approach will be used to create an array 2,000 times.

The approaches we will compare include the most common NumPy functions for creating a 1d array of random floats, including:

numpy.random.rand()
numpy.random.random_sample()
rng.random()
rng.random(out=A)

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 1d array of random floats

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = 1000000

# number of times to run each snippet

N = 2000

# numpy.random.rand()

result = timeit.timeit('numpy.random.rand(SHAPE)', globals=globals(), number=N)

print(f'numpy.random.rand() {result:.3f} seconds')

# numpy.random.random_sample()

result = timeit.timeit('numpy.random.random_sample(SHAPE)', globals=globals(), number=N)

print(f'numpy.random.random_sample() {result:.3f} seconds')

# rng.random()

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.random(SHAPE)', globals=globals(), number=N)

print(f'rng.random() {result:.3f} seconds')

# rng.random(out=A)

result = timeit.timeit('A=numpy.empty(SHAPE);rng=numpy.random.default_rng(1);rng.random(out=A)', globals=globals(), number=N)

print(f'rng.random(A) {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

numpy.random.rand() 11.946 seconds

numpy.random.random_sample() 11.740 seconds

rng.random() 6.538 seconds

rng.random(A) 6.660 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

-----------------------------|------------

numpy.random.rand() | 11.946

numpy.random.random_sample() | 11.740

rng.random() | 6.538

rng.random(A) | 6.660

We can see that the two approaches that use the legacy API have a similar execution time of around 12 seconds, whereas the two approaches that use the more modern API have an execution time that is a little more than half the time.

This highlights that we should be using the modern NumPy random number generation API to generate floats if speed is important.

We can also see that it may be slightly faster to use the rng.random() function to create the array and populate it rather than to create an empty array and have the rng.random() function populate for us.

The difference is small, although re-running the benchmark test shows a similar pattern in performance.

numpy.random.rand() 11.846 seconds

numpy.random.random_sample() 11.833 seconds

rng.random() 6.552 seconds

rng.random(A) 6.760 seconds

Free Python Benchmarking Course

Get FREE access to my 7-day email course on Python Benchmarking.

Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.

Learn more

Fastest Way to Create 2D NumPy Array of Random Floats

We can explore the fastest way to create a modestly sized two-dimensional NumPy array of random floats, e.g. a matrix.

Each array will have the size (1000,1000) and we will run each method 1,000 times.

The 2d nature of the array allows us to explore additional approaches, such as:

Generating a 1d array and reshaping it.
Calling numpy.random.random() to create the 2d array and populate

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 2d array of random floats

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = (1000,1000)

# number of times to run each snippet

N = 1000

# numpy.random.rand()

result = timeit.timeit('numpy.random.rand(SHAPE[0]*SHAPE[1]).reshape(SHAPE)', globals=globals(), number=N)

print(f'numpy.random.rand() {result:.3f} seconds')

# numpy.random.random()

result = timeit.timeit('numpy.random.random(SHAPE)', globals=globals(), number=N)

print(f'numpy.random.random() {result:.3f} seconds')

# numpy.random.random_sample()

result = timeit.timeit('numpy.random.random_sample(SHAPE)', globals=globals(), number=N)

print(f'numpy.random.random_sample() {result:.3f} seconds')

# rrng.random()

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.random(SHAPE)', globals=globals(), number=N)

print(f'rng.random() {result:.3f} seconds')

# rng.random(out=A)

result = timeit.timeit('A=numpy.empty(SHAPE);rng=numpy.random.default_rng(1);rng.random(out=A)', globals=globals(), number=N)

print(f'rng.random(A) {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

numpy.random.rand() 5.894 seconds

numpy.random.random() 5.854 seconds

numpy.random.random_sample() 5.826 seconds

rng.random() 3.248 seconds

rng.random(A) 3.319 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

-----------------------------|------------

numpy.random.rand() | 5.894

numpy.random.random() | 5.854

numpy.random.random_sample() | 5.826

rng.random() | 3.248

rng.random(A) | 3.319

Again, we see a clear distinction between the execution time of the legacy API at nearly 6 seconds and the modern API at just over 3 seconds.

It seems all of the functions used in the legacy API have a similar performance of about 5.8 seconds. It is likely that behind the scenes that each function is calling the same internal function for generating the random floating point values.

As with the previous example, the modern random number generator that creates the array for us and populates it is slightly faster than us creating an empty array and having it populated.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Float Data Types Matter

The data type of the array matters.

We would expect that a larger data type requires more random bits to be generated.

Therefore, we might expect that an array of float32 will be faster to create than an array of random float64 values.

We can explore this with the rng.random() function. We will generate a 1d array with one million elements with float32 and then again with float64 random values and repeat the process 2,000 times.

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 1d array of random floats

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = 1000000

# number of times to run each snippet

N = 2000

# rng.random(float32)

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.random(SHAPE, dtype=numpy.float32)', globals=globals(), number=N)

print(f'rng.random(float32) {result:.3f} seconds')

# rng.random(float64)

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.random(SHAPE, dtype=numpy.float64)', globals=globals(), number=N)

print(f'rng.random(float64) {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

1 2	rng.random(float32) 5.200 seconds rng.random(float64) 6.766 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

--------------------|------------

rng.random(float32) | 5.200

rng.random(float64) | 6.766

We can see that our expectations were confirmed.

It is faster to create an array of floats with the smaller data type of float32 compared to the larger data type of float64.

Where possible we should use the smallest possible data type when generating floating point values in order to reduce execution time.

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Fastest Way to Create 1D NumPy Array of Random Integers

We can explore the fastest way to create a modestly sized NumPy array of random integer values.

In this case, we will create a fixed size 1d array with one million elements (1,000,000) of random integers between 0 and 100 (inclusive) with the default data type, int64 on most platforms. Each approach will be used to create an array 1,000 times.

The approaches we will compare include the most common NumPy functions for creating a 1d array of random integers, including:

numpy.random.randint()
numpy.random.random_integers()
numpy.random.choice
rng.integers()
rng.choice()

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 1d array of random integers

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = 1000000

# range of values

LOW, HIGH = 0, 100

# number of times to run each snippet

N = 1000

# numpy.random.randint()

result = timeit.timeit('numpy.random.randint(LOW, HIGH+1, SHAPE)', globals=globals(), number=N)

print(f'numpy.random.randint() {result:.3f} seconds')

# numpy.random.random_integers()

result = timeit.timeit('numpy.random.random_integers(LOW, HIGH, SHAPE)', globals=globals(), number=N)

print(f'numpy.random.random_integers() {result:.3f} seconds')

# numpy.random.choice()

result = timeit.timeit('numpy.random.choice(numpy.arange(HIGH+1), SHAPE)', globals=globals(), number=N)

print(f'numpy.random.choice() {result:.3f} seconds')

# rng.integers()

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.integers(LOW, HIGH+1, SHAPE)', globals=globals(), number=N)

print(f'rng.random() {result:.3f} seconds')

# rng.choice()

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.choice(numpy.arange(HIGH+1), SHAPE)', globals=globals(), number=N)

print(f'rng.choice() {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

numpy.random.randint() 7.353 seconds

numpy.random.random_integers() 7.237 seconds

numpy.random.choice() 9.864 seconds

rng.integers() 2.630 seconds

rng.choice() 5.877 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

-------------------------------|------------

numpy.random.randint() | 7.353

numpy.random.random_integers() | 7.237

numpy.random.choice() | 9.864

rng.integers() | 2.630

rng.choice() | 5.877

We can see a diction in execution time between the legacy and modern APIs as we did when generating random floats.

We can also see that the choice() approach is generally slower than generating random integers directly.

In this case, the fastest approach was rng.integers() and is the preferred approach when generating an array of random integers.

Integer Data Types Matter

The data type of the integer array matters.

We would expect that a larger data type requires more random bits to be generated.

Therefore, we might expect that an array of int32 will be faster to create than an array of random int64 values. Similarly, we may expect int16 to be faster again, and int8 to be the fastest of all.

We can explore this with the rng.integers() function. We will generate a 1d array with one million random integer values between 0 and 100 with each integer type (8, 16, 32, and 64 bits) and repeat the process 2,000 times.

It may also be interesting to contrast the results between signed and unsigned data types. Recall that signed types allow negative values, whereas unsigned types only allow positive values and offer a larger range in the positive domain.

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 1d array of random integers with different types

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = 1000000

# range of values

LOW, HIGH = 0, 100

# number of times to run each snippet

N = 2000

# list of types to compare

types = ['int8', 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', 'uint64']

# benchmark each data type

for t in types:

result = timeit.timeit(f'rng=numpy.random.default_rng(1);rng.integers(LOW, HIGH+1, SHAPE, dtype=numpy.{t})', globals=globals(), number=N)

print(f'rng.integers({t}) {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

rng.integers(int8) 14.848 seconds

rng.integers(uint8) 14.808 seconds

rng.integers(int16) 4.478 seconds

rng.integers(uint16) 4.466 seconds

rng.integers(int32) 4.582 seconds

rng.integers(uint32) 4.568 seconds

rng.integers(int64) 5.132 seconds

rng.integers(uint64) 5.148 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

---------------------|------------

rng.integers(int8) | 14.848

rng.integers(uint8) | 14.808

rng.integers(int16) | 4.478

rng.integers(uint16) | 4.466

rng.integers(int32) | 4.582

rng.integers(uint32) | 4.568

rng.integers(int64) | 5.132

rng.integers(uint64) | 5.148

The results are fascinating.

Firstly, we can see that generally generating unsigned integers was slightly faster in most cases (except int64 types).

The expectation is that fewer bits would be faster to generate.

In this case, we can see that int8 types were the lowest to generate.

We can see that there was very little difference between int16 and int32 types and int64 random integers were slower to generate by about half a second.

It suggests that we may want to use an unsigned int16 or int32 type when generating random ints, as long as the type can hold the range required.

Fastest Way to Create 1D NumPy Array of Random Booleans

We can explore the fastest way to create a modestly sized NumPy array of random boolean values.

These are values that are either True or False.

In this case, we will create a fixed-size 1d array with one million elements (1,000,000) of random boolean values or integers between 0 and 1 (inclusive). If possible, we will try and set the type to be numpy.bool_. Each approach will be used to create an array 2,000 times.

The numpy.random APIs do not provide a way to create arrays of random booleans directly, therefore we will explore a few approaches that involve generating integers, using choice, and converting floats to booleans, including:

numpy.random.rand()<0.5
numpy.random.choice([True,False])
numpy.random.randint(0,2)
rng.random()<0.5
rng.choice([True,False])
rng.integers(0,1)

The complete example is listed below.

# SuperFastPython.com

# benchmark creating a 1d array of random booleans

import numpy

import timeit

# size and shape of the arrays to create

SHAPE = 1000000

# number of times to run each snippet

N = 2000

# numpy.random.rand()<0.5

result = timeit.timeit('numpy.random.rand(SHAPE)<0.5', globals=globals(), number=N)

print(f'numpy.random.rand()<0.5 {result:.3f} seconds')

# numpy.random.choice([True,False])

result = timeit.timeit('numpy.random.choice([True,False], SHAPE)', globals=globals(), number=N)

print(f'numpy.random.choice([True,False]) {result:.3f} seconds')

# numpy.random.randint(0,2)

result = timeit.timeit('numpy.random.randint(0, 2, SHAPE,numpy.bool_)', globals=globals(), number=N)

print(f'numpy.random.randint(0,2) {result:.3f} seconds')

# rng.random()<0.5

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.random(SHAPE)<0.5', globals=globals(), number=N)

print(f'rng.random()<0.5 {result:.3f} seconds')

# rng.choice([True,False])

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.choice([True,False],SHAPE)', globals=globals(), number=N)

print(f'rng.choice([True,False]) {result:.3f} seconds')

# rng.integers(0,1)

result = timeit.timeit('rng=numpy.random.default_rng(1);rng.integers(0,1,SHAPE,numpy.bool_,True)', globals=globals(), number=N)

print(f'rng.integers(0,1) {result:.3f} seconds')

Running the example benchmarks each approach and reports the sum execution time.

numpy.random.rand()<0.5 13.041 seconds

numpy.random.choice([True,False]) 10.334 seconds

numpy.random.randint(0,2) 1.902 seconds

rng.random()<0.5 7.569 seconds

rng.choice([True,False]) 12.157 seconds

rng.integers(0,1) 1.549 seconds

We can restructure the output into a table for comparison.

Approach | Time (sec)

----------------------------------|------------

numpy.random.rand()<0.5 | 13.041

numpy.random.choice([True,False]) | 10.334

numpy.random.randint(0,2) | 1.902

rng.random()<0.5 | 7.569

rng.choice([True,False]) | 12.157

rng.integers(0,1) | 1.549

Generally, we can see that using the choice() approach with the legacy and modern APIs is the slowest approach.

We can also see that creating an array of booleans from an array of floating point values as a mask is also very inefficient with both APIs.

Generally, the fastest approach was to generate 0 and 1 integers and to store the results in an array with the type numpy.bool_.

From the two approaches of this type tested, the approach that uses the modern API is nearly half a second faster.

Recommendations

The best recommendation is to identify the specific random number array tasks you need in your program, then benchmark them in isolation to discover what has the lowest execution speed on your system with your hardware and library versions.

I cannot stress this enough. The numbers above are highly specific and the patterns in performance observed may or may not hold on your specific platform.

That being said, if performance matters, you probably want to:

Use the modern numpy.random API, specifically:
- Use numpy.random.Generator such as rng.random() and rng.integers().
- Don’t use numpy.random.RandomState
Generate float32 random floats, if it has enough precision for your program.
Generate uint16 or uint32 random ints, if they have enough precision for your program.
Use rng.integers(0,1) to generate random booleans.

Don’t rely on assumptions about performance, such as with data types of functions to call.

Always benchmark.

Takeaways

You now know how to benchmark and discover the fastest way to generate NumPy arrays of random values.

Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!

Do you have any additional tips?
I’d love to hear about them!

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Vincent Ghilione on Unsplash

Benchmark Fastest Way To Create NumPy Random Numbers

Need Fast NumPy Random Numbers

Benchmark NumPy Random Numbers

Fastest Way to Create 1D NumPy Array of Random Floats

Fastest Way to Create 2D NumPy Array of Random Floats

Float Data Types Matter

Fastest Way to Create 1D NumPy Array of Random Integers

Integer Data Types Matter

Fastest Way to Create 1D NumPy Array of Random Booleans

Recommendations

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Python Benchmarking Fast
(without the frustration)

Additional menu

Need Fast NumPy Random Numbers

Benchmark NumPy Random Numbers

Fastest Way to Create 1D NumPy Array of Random Floats

Fastest Way to Create 2D NumPy Array of Random Floats

Float Data Types Matter

Fastest Way to Create 1D NumPy Array of Random Integers

Integer Data Types Matter

Fastest Way to Create 1D NumPy Array of Random Booleans

Recommendations

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Leave a Reply Cancel reply

Footer

Learn Python Benchmarking Fast (without the frustration)

Learn Python Benchmarking Fast
(without the frustration)