Benchmark Fastest Way To Create NumPy Array
You can benchmark NumPy array creation functions and discover the fastest approaches to use in different circumstances.
Generally, the numpy.empty() function is faster when an uninitialized array is needed. If an initialized NumPy array is required, then numpy.zeros() or numpy.full() are the fastest.
An important consideration is the data type of the array. A type should be chosen that fits the data the array intends to hold, and is fast to create on the specific system on which the code will be run. Interestingly, it can be faster to create arrays with a larger data type than is needed in some cases.
In this tutorial, you will discover how to benchmark NumPy array creation functions and discover the fastest approaches you can use in your own programs.
Let's get started.
Need To Create NumPy Arrays Fast
Creating arrays is the most common activity when using NumPy.
As such, we need to know how to create arrays fast.
There are many ways to create new NumPy arrays, although perhaps the main difference between the approaches are:
- 1. Create an empty uninitialized array:
- 2. Create an array initialized to a given value.
In this case, we will ignore creating arrays from or like other arrays and creating arrays as the outcome of an algorithm.
Generally, the division may be between creating empty arrays vs an initialized array.
Given that we need to create new arrays all the time in our NumPy program, which approach or approach is fast?
Specific requirements aside, What function should we use when creating new arrays in our programs?
Benchmark NumPy Array Creation
We can explore the question of how fast the different array creation methods are using benchmarking.
In this case, we will use an approach to create an array of a modest fixed size, then repeat this process many times to give an estimated time. We can then compare the times to see the relative performance of the methods tested.
You can use this approach to test your own favorite NumPy array creation methods.
If you use or extend the NumPy benchmarking approach used in this tutorial, let me know in the comments below. I'd love to see what you come up with.
We could use the time.perf_counter() function directly and develop a helper function to perform the benchmarking and report results.
You can learn more about benchmarking with the time.perf_counter() function in the tutorial:
Instead, in this case, we will use the timeit API, specifically the timeit.timeit() function and specify the string of array creation code to run and a fixed number of times to run it.
We will also provide the globals argument for any constants defined in our benchmark code, such as array size or shape.
For example:
...
# benchmark a thing
result = timeit.timeit('...', globals=globals(), number=N)
print(f'approach {result:.3f} seconds')
You can learn more about benchmarking with the timeit.timeit() function in the tutorial:
The number of runs in each benchmark was tuned to ensure that each snippet was executed in more than one second and less than about 10 seconds.
Let's get started.
Fastest Way To Create NumPy 1D Array (uninitialized)
We can explore the fastest way to create a modestly sized uninitialized one-dimensional NumPy array.
In this case, we will define a fixed-size 1d array with one million elements (1,000,000). Each approach will be used to create an array 10,000 times.
The approaches we will compare include the most common NumPy functions for creating a new 1d array of a given size:
- numpy.empty()
- numpy.zeros()
- numpy.ones()
- numpy.arange()
The complete example is listed below.
# SuperFastPython.com
# benchmark creating an empty 1d numpy array
import numpy
import timeit
# size and shape of the arrays to create
SHAPE = 1000000
# number of times to run each snippet
N = 10000
# numpy.empty()
result = timeit.timeit('numpy.empty(SHAPE)', globals=globals(), number=N)
print(f'numpy.empty() {result:.3f} seconds')
# numpy.zeros()
result = timeit.timeit('numpy.zeros(SHAPE)', globals=globals(), number=N)
print(f'numpy.zeros() {result:.3f} seconds')
# numpy.ones()
result = timeit.timeit('numpy.ones(SHAPE)', globals=globals(), number=N)
print(f'numpy.ones() {result:.3f} seconds')
# numpy.arange()
result = timeit.timeit('numpy.arange(SHAPE)', globals=globals(), number=N)
print(f'numpy.arange() {result:.3f} seconds')
Running the example benchmarks each approach and reports the sum execution time.
numpy.empty() 0.182 seconds
numpy.zeros() 8.103 seconds
numpy.ones() 9.022 seconds
numpy.arange() 8.528 seconds
We can restructure the output into a table for comparison.
Approach | Time (sec)
---------------|------------
numpy.empty() | 0.182
numpy.zeros() | 8.103
numpy.ones() | 9.022
numpy.arange() | 8.528
We can see that creating an empty array via numpy.empty() was the fastest approach by a massive margin. It was about 44x faster than the next fastest method which was numpy.zeros().
This highlights that if we need an uninitialized 1d array, we should strongly consider using the numpy.empty() function.
Interestingly, we can see that numpy.ones() may have been the slowest method. Surprisingly, there was a one-second difference between numpy.zeros() and numpy.ones() as internally I would have guessed they perform nearly identical tasks (e.g., create an empty array and assign a value to each element in bulk).
Fastest Way To Create NumPy 2D Array (uninitialized)
We can explore the fastest way to create a modestly sized uninitialized two-dimensional NumPy array.
In this case, we will only compare the two fastest approaches for creating a one-dimensional array in the last section:
- numpy.empty()
- numpy.zeros()
Each array will have a size (100000,100000) and we will run each method 100,000 times.
The complete example is listed below.
# SuperFastPython.com
# benchmark creating an empty 2d numpy array (matrix)
import numpy
import timeit
# size and shape of the arrays to create
SHAPE = (100000,100000)
# number of times to run each snippet
N = 100000
# numpy.empty()
result = timeit.timeit('numpy.empty(SHAPE)', globals=globals(), number=N)
print(f'numpy.empty() {result:.3f} seconds')
# numpy.zeros()
result = timeit.timeit('numpy.zeros(SHAPE)', globals=globals(), number=N)
print(f'numpy.zeros() {result:.3f} seconds')
Running the example executes each approach many times and reports the sum time in seconds.
numpy.empty() 12.340 seconds
numpy.zeros() 12.081 seconds
We can restructure the results into a table for comparison.
Approach | Time (sec)
---------------|------------
numpy.empty() | 12.340
numpy.zeros() | 12.081
Interestingly, the results suggest that it may be slightly faster to use numpy.zeros() to create a 2d array than numpy.empty().
The difference is small, less than 300 ms in this case. Nevertheless, if real, it is surprising as we might expect that creating an empty array would be as fast or as fast as creating an array and initializing it.
Repeating the same benchmark a few times, I see a similar pattern, although with different specific benchmark results.
For example:
numpy.empty() 11.917 seconds
numpy.zeros() 11.700 seconds
It may be a real effect and the reason may require further investigation.
Fastest Way To Create and Initialize NumPy Array
We can explore the fastest way to create an initialized one-dimensional NumPy array.
This may be a more common situation.
We have seen that numpy.empty() is fast and numpy.zeros() is faster than numpy.ones(). Therefore, we will focus our attention on these approaches, as well as the numpy.full() function.
Specifically, we will directly compare the approaches:
- numpy.zeros()
- numpy.full()
- numpy.empty() + slice assign
- numpy.empty() + fill()
- numpy.empty() + numpy.zeros_like()
We may need to initialize with an arbitrary value, so this is the focus of the methods explored with numpy.zeros() provided as a baseline.
Each approach is tested with an array with one million elements (1,000,000) and each test is run 10,000 times.
The complete example is listed below.
# SuperFastPython.com
# benchmark creating and initializing a 1d numpy array
import numpy
import timeit
# size and shape of the arrays to create
SHAPE = 1000000
# number of times to run each snippet
N = 10000
# numpy.zeros()
result = timeit.timeit('numpy.zeros(SHAPE)', globals=globals(), number=N)
print(f'numpy.zeros() {result:.3f} seconds')
# numpy.full()
result = timeit.timeit('numpy.full(SHAPE, 0)', globals=globals(), number=N)
print(f'numpy.full() {result:.3f} seconds')
# a=numpy.full(SHAPE);a[:]=0
result = timeit.timeit('a=numpy.empty(SHAPE);a[:]=0', globals=globals(), number=N)
print(f'a=numpy.empty(SHAPE);a[:]=0 {result:.3f} seconds')
# a=numpy.empty(SHAPE);a.fill(0)
result = timeit.timeit('a=numpy.empty(SHAPE);a.fill(0)', globals=globals(), number=N)
print(f'a=numpy.empty(SHAPE);a.fill(0) {result:.3f} seconds')
# a=numpy.empty(SHAPE);numpy.zeros_like(a)
result = timeit.timeit('a=numpy.empty(SHAPE);numpy.zeros_like(a)', globals=globals(), number=N)
print(f'a=numpy.empty(SHAPE);numpy.zeros_like(a) {result:.3f} seconds')
Running the example executes each approach many times and reports the total time in seconds.
numpy.zeros() 8.022 seconds
numpy.full() 8.182 seconds
a=numpy.empty(SHAPE);a[:]=0 10.102 seconds
a=numpy.empty(SHAPE);a.fill(0) 10.403 seconds
a=numpy.empty(SHAPE);numpy.zeros_like(a) 11.040 seconds
We can restructure each approach into a table for direct comparison.
Approach | Time (sec)
-----------------------------------------|------------
numpy.zeros() | 8.022
numpy.full() | 8.182
a=numpy.empty(SHAPE);a[:]=0 | 10.102
a=numpy.empty(SHAPE);a.fill(0) | 10.403
a=numpy.empty(SHAPE);numpy.zeros_like(a) | 11.040
The results show that numpy.zeros() is the fastest overall, as we might have expected. It is probably using a special trick down in the C implementation.
The next fastest was numpy.full(), which was 2 or more seconds faster than all of the approaches that involved creating an empty array and populating it.
This highlights that we should use the built-in NumPy functions where possible as they are likely faster than our attempts to re-implement them with multiple NumPy statements.
Array DataType Matters
We can explore the data type used when creating a NumPy array.
The data type matters as it defines the amount of main memory to allocate to the array, the space required.
Allocating more memory is slower, therefore it is faster if we can choose the smallest data type size that will best hold the data we intend to hold.
For example, if we only need True and False values, such as for an array mask, we should use numy.bool_. If we only need positive integers less than 256, we should use numpy.ushort, and so on.
You can see a list of all NumPy data types and the bounds of data they hold here:
I will use the aliases for the data types that give an idea of the number of bits used in the type, it helps in analysis.
We can highlight the impact of the data type on array creation, by creating an empty array with each NumPy data type and comparing the execution time.
We will focus on the numerical data types and exclude strings and objects as they are orders of magnitude slower than the numerical types.
The complete example is listed below.
# SuperFastPython.com
# benchmark creating an empty 1d numpy array with different data types
import numpy
import timeit
# size and shape of the arrays to create
SHAPE = 1000000
# number of times to run each snippet
N = 1000000
# data types
TYPES = [
'numpy.int8',
'numpy.int16',
'numpy.int32',
'numpy.int64',
'numpy.uint8',
'numpy.uint16',
'numpy.uint32',
'numpy.uint64',
'numpy.float16',
'numpy.float32',
'numpy.float64',
'numpy.complex64',
'numpy.complex128',
'numpy.bool_']
# create an empty list with each data type
for numpy_type in TYPES:
result = timeit.timeit(f'numpy.empty(SHAPE, dtype={numpy_type})', globals=globals(), number=N)
print(f'{numpy_type}: {result:.3f} seconds')
Running the example creates an empty NumPy array with each NumPy data type and reports the overall execution time.
numpy.int8: 0.312 seconds
numpy.int16: 5.161 seconds
numpy.int32: 9.590 seconds
numpy.int64: 1.748 seconds
numpy.uint8: 0.309 seconds
numpy.uint16: 5.201 seconds
numpy.uint32: 9.454 seconds
numpy.uint64: 2.233 seconds
numpy.float16: 5.888 seconds
numpy.float32: 9.723 seconds
numpy.float64: 4.225 seconds
numpy.complex64: 4.074 seconds
numpy.complex128: 0.687 seconds
numpy.bool_: 0.307 seconds
We can restructure the output as a table for direct comparison.
Data Type | Time (sec)
-----------------|------------
numpy.int8 | 0.312
numpy.int16 | 5.161
numpy.int32 | 9.590
numpy.int64 | 1.748
numpy.uint8 | 0.309
numpy.uint16 | 5.201
numpy.uint32 | 9.454
numpy.uint64 | 2.233
numpy.float16 | 5.888
numpy.float32 | 9.723
numpy.float64 | 4.225
numpy.complex64 | 4.074
numpy.complex128 | 0.687
numpy.bool_ | 0.307
The pattern of speed difference will be different on your platform, depending on how different-sized data types are handled by the operating system and/or how NumPy was compiled on your system.
Generally, we might expect that as the number of bits per data type is increased, we see a linear increase in execution time.
This is not the case. Looking at signed integers, the int8 type is the fastest followed by int64 and int32 is the slowest. A similar pattern is seen with the unsigned integers.
This is surprising and highlights the need to benchmark on a platform.
It suggests we may get better performance using a larger data type than needed when creating many empty arrays, e.g. int64 when we only need int32 or int16.
We see a similar artifact in floating point arrays, where a float32 array is nearly twice as slow to create than a float64 or float16 array. Again, we may want to use an array with more precision than is required in some situations.
Recommendations
The best recommendation is to identify the array creation tasks you need in your program, then benchmark them in isolation to discover what has the lowest execution speed on your system with your hardware and library versions.
I cannot stress this enough. The numbers above are highly specific and the patterns in performance observed may or may not hold on your specific platform.
That being said, you probably want to:
- Use numpy.empty() to create uninitialized arrays.
- Use numpy.zeros() to create arrays initialized to 0.
- Use numpy.full() to create arrays initialized to a given value.
- Use a data type that fits the data you require and is fastest to create.
Don't rely on assumptions about performance, such as with data types of functions to call.
Always benchmark.
Takeaways
You now know how to benchmark NumPy array creation functions and discover the fastest approaches you can use in your own programs.