You can benchmark functions and algorithms to sort NumPy arrays to discover the fastest approaches to use.
Generally, it is slightly faster to sort a NumPy array in place rather than to create a sorted copy. This approach should be preferred if the program requirements can support it.
The choice of sorting algorithm used by the sort() function matters. Generally, the default of quicksort is probably the best for randomly ordered numerical data, although it is a good idea to test different algorithms with samples of data to confirm this assumption holds for your specific data.
In this tutorial, you will discover how to benchmark and discover the fastest way to sort NumPy arrays in Python.
Let’s get started.
Need Fast Sorting of NumPy Arrays
Sorting NumPy arrays in our Python program is a common operation.
It seems straightforward, call sort(), and the array is sorted.
But is this the fastest method?
In fact, there are a few ways we can sort NumPy arrays, from in-place to a sorted copy, to an array of indexes of sorted elements.
For example, some functions include:
Additionally, there are different sorting algorithms we can use that have different performance characteristics.
This includes algorithms such as:
- Quicksort
- Heapsort
- Mergesort
- Timsort
In fact, internally, a NumPy sort() function may choose an algorithm based on the properties of the data.
The default is ‘quicksort’. Note that both ‘stable’ and ‘mergesort’ use timsort or radix sort under the covers and, in general, the actual implementation will vary with data type. The ‘mergesort’ option is retained for backwards compatibility.
— numpy.sort
These algorithms have different theoretical capabilities, but what is their performance on your specific array with your specific data?
Given that the algorithm may change based on the data, we cannot know the expected performance profile beforehand.
We must benchmark.
What is the fastest way to sort your NumPy arrays?
Run loops using all CPUs, download your FREE book to learn how.
Benchmark Sort NumPy Arrays
We can explore the question of how fast the different approaches to sort NumPy arrays are using benchmarking.
In this case, we will use an approach to sort an array of random numbers of a modest fixed size, then repeat this process many times to give an estimated time. We can then compare the times to see the relative performance of the approaches tested.
You can use this approach to benchmark your own favorite NumPy array operations.
If you use or extend the NumPy benchmarking approach used in this tutorial, let me know in the comments below. I’d love to see what you come up with.
We could use the time.perf_counter() function directly and develop a helper function to perform the benchmarking and report results.
You can learn more about benchmarking with the time.perf_counter() function in the tutorial:
Instead, in this case, we will use the timeit API.
We cannot use the timeit.timeit() function as if we use the “setup” argument to prepare an array of random data to sort, it will only be prepared once and then sorted again and again.
Instead, we will use the timeit.repeat() function as it allows us to repeat the same benchmark test many times. Each repeat of the test will perform the same setup, allowing us to create a new array of random data to sort for each benchmark.
For example:
1 2 3 4 5 6 7 8 |
... # define string to create data array to sort DATA_STR = "rng=numpy.random.default_rng(1);A=..." # number of times to run each snippet N = 2000 # benchmark a thing result = timeit.repeat('...', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'result {sum(result):.3f} seconds') |
You can learn more about benchmarking with the timeit.repeat() function in the tutorial:
The number of runs in each benchmark was tuned to ensure that each snippet was executed in more than one second and less than about 10 seconds.
Let’s get started.
Fastest Way to Sort 1D NumPy Array of Floats
We can explore the fastest way to sort a modestly sized NumPy array of random floating point values in [0,1).
In this case, we will use a pref-defined “setup” code snippet that creates a random number generator with a given seed and creates a 1D array of 100,000 random floats. We will then execute the given sorting approach 2,000 times and report the total execution time.
The approaches we will compare include the most common NumPy functions for sorting a 1d array, including:
- numpy.sort()
- ndarray.sort()
- array[numpy.argsort()]
Generally, we may expect that sorting in place would be slightly faster as no new array has to be created with sorted values or indexes of sorted values.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# SuperFastPython.com # benchmark sorting a 1d array of random floats import numpy import timeit # define string to create data array to sort DATA_STR = "rng=numpy.random.default_rng(1);A=rng.random(100000)" # number of times to run each snippet N = 2000 # numpy.sort() result = timeit.repeat('numpy.sort(A)', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'numpy.sort() {sum(result):.3f} seconds') # ndarray.sort() result = timeit.repeat('A.sort()', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'ndarray.sort() {sum(result):.3f} seconds') # numpy.argsort() result = timeit.repeat('A[numpy.argsort(A)]', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'numpy.argsort() {sum(result):.3f} seconds') |
Running the example benchmarks each approach and reports the sum execution time.
1 2 3 |
numpy.sort() 10.568 seconds ndarray.sort() 10.457 seconds numpy.argsort() 15.666 seconds |
We can restructure the output into a table for comparison.
1 2 3 4 5 |
Approach | Time (sec) ----------------|------------ numpy.sort() | 10.568 ndarray.sort() | 10.457 numpy.argsort() | 15.666 |
We can see that we expected the ndarray.sort() approach of sorting the array in place was the fastest.
The difference was minor, perhaps 100 milliseconds, or less given random variation. Any difference between ndarray.sort() and numpy.sort() will be due to the creation of a new array required for the result.
Nevertheless, this confirms that in general, we should try to sort data in place if our program can accommodate it.
Next, let’s see if the same expectations and confirmation hold for arrays of integers.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
Fastest Way to Sort 1D NumPy Array of Integers
We can explore the fastest way to sort a modestly sized NumPy array of random integer values in [0,100].
Again, we will use the “setup” argument to create a random number generator and create an array to be sorted, in this case, an array of 100,000 random integers between 0 and 100 (inclusive).
We will benchmark and compare the same three approaches used for sorting a 1D array.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# SuperFastPython.com # benchmark sorting a 1d array of random integers import numpy import timeit # define string to create data array to sort DATA_STR = "rng=numpy.random.default_rng(1);A=rng.integers(0,100+1,100000)" # number of times to run each snippet N = 2000 # numpy.sort() result = timeit.repeat('numpy.sort(A)', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'numpy.sort() {sum(result):.3f} seconds') # ndarray.sort() result = timeit.repeat('A.sort()', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'ndarray.sort() {sum(result):.3f} seconds') # numpy.argsort() result = timeit.repeat('A[numpy.argsort(A)]', setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'numpy.argsort() {sum(result):.3f} seconds') |
Running the example benchmarks each approach and reports the sum execution time.
1 2 3 |
numpy.sort() 6.108 seconds ndarray.sort() 5.986 seconds numpy.argsort() 7.077 seconds |
We can restructure the output into a table for comparison.
1 2 3 4 5 |
Approach | Time (sec) ----------------|------------ numpy.sort() | 6.108 ndarray.sort() | 5.986 numpy.argsort() | 7.077 |
We can see that the same general pattern holds as we saw with sorting arrays of floating point values.
It is faster to sort the array of integers in place, then to create a copy of the sorted array, and finally to create and use an array of indices to sorted values.
This confirms that when sorting arrays of integers we should sort the array in place, if possible.
Next, let’s explore the effect of different sorting algorithms.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Fastest Sort Algorithm to Sort 1D NumPy Array of Floats
We can explore the benchmark execution time of choosing different sort algorithms when sorting a 1D array of random floating point values.
The numpy.sort() function offers four algorithms and may choose or change the algorithm used based on the type of data provided.
In this case, we will explore sorting the same array of random floats with each sort algorithm (created anew for each test).
We may expect that the default algorithm of quicksort will perform best for an array of randomly ordered numerical values.
Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions
— Quicksort, Wikipedia.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# SuperFastPython.com # benchmark sort algorithms for a 1d array of random floats import numpy import timeit # define string to create data array to sort DATA_STR = "rng=numpy.random.default_rng(1);A=rng.random(100000)" # number of times to run each snippet N = 3000 # kinds of sort algorithms kinds = ['quicksort', 'mergesort', 'heapsort', 'stable'] # sort array with each algorithm for kind in kinds: # ndarray.sort() result = timeit.repeat(f"A.sort(kind='{kind}')", setup=DATA_STR, globals=globals(), number=1, repeat=N) print(f'ndarray.sort({kind}) {sum(result):.3f} seconds') |
Running the example benchmarks each approach and reports the sum execution time.
1 2 3 4 |
ndarray.sort(quicksort) 15.801 seconds ndarray.sort(mergesort) 19.560 seconds ndarray.sort(heapsort) 26.911 seconds ndarray.sort(stable) 19.600 seconds |
We can restructure the output into a table for comparison.
1 2 3 4 5 6 |
Approach | Time (sec) ------------------------|------------ ndarray.sort(quicksort) | 15.801 ndarray.sort(mergesort) | 19.560 ndarray.sort(heapsort) | 26.911 ndarray.sort(stable) | 19.600 |
The result is interesting, there is a notable difference between the results.
We can see that the default, “quicksort” gave the fastest result in this case.
We can see that “mergesort” and “stable” follow closely, and may be using the same algorithm under the covers given the closeness of the result and the comment in the function definition that they may both be implemented using “radix sort”.
The slowest, in this case, was “heapsort“.
One approach in choosing a sorting algorithm is analytically, based on the data in the array (how sorted it may already be, etc.) and the expected performance of an algorithm for the data.
We should still do this, but we should also benchmark to confirm our analytical analysis (assumptions) was correct.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to benchmark and discover the fastest way to sort NumPy arrays in Python.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any additional tips?
I’d love to hear about them!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Łukasz Nieścioruk on Unsplash
Do you have any questions?