Last Updated on September 29, 2023
You can use one of many open-source and proprietary BLAS/LAPACK libraries to accelerate vector and matrix operations in NumPy.
In this tutorial, you will discover common BLAS and LAPACK libraries that can be used with NumPy.
Let’s get started.
What is BLAS/LAPACK
NumPy is an array library in Python.
It makes use of third-party libraries to perform array functions efficiently.
One example is the BLAS and LAPACK specifications used by NumPy to execute vector, matrix, and linear algebra operations.
The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software, LAPACK for example.
— BLAS (Basic Linear Algebra Subprograms)
We may install one of many different libraries that implement the BLAS and LAPACK. Most libraries that implement BLAS also implement LAPACK because LAPACK is built on top of BLAS. Therefore, we often refer to them simply as BLAS libraries.
The benefit of BLAS libraries is that it makes specific linear algebra operations more efficient by using advanced algorithms and multithreading.
The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient low level implementations of standard linear algebra algorithms. Those libraries may be provided by NumPy itself using C versions of a subset of their reference implementations but, when possible, highly optimized libraries that take advantage of specialized processor functionality are preferred.
— Linear algebra (numpy.linalg), NumPy API.
You can learn more about BLAS and LAPACK in the tutorial:
Run loops using all CPUs, download your FREE book to learn how.
Common BLAS/LAPACK Libraries
BLAS is just a specification.
We must install a library that implements the specification.
Different libraries will have different capabilities and may optimize for different use cases or hardware.
Some libraries may be suited to your specific hardware. Most will detect the types of CPUs available and will tailor the compilation of the library to take best advantage of your hardware.
Additionally, most libraries offer configuration operations allowing the performance to be tailored, such as configuring the number of threads used.
Some of the more common BLAS libraries include:
- OpenBLAS
- ATLAS: Automatically Tuned Linear Algebra Software
- MKL: Math Kernel Library aka Intel MKL.
- Accelerate: aka Apple Accelerate or vecLib.
- BLIS: BLAS-like Library Instantiation Software Framework
You can see a fuller list of BLAS libraries here:
OpenBLAS
OpenBLAS is perhaps the most common BLAS/LAPACK library.
OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types.
— OpenBLAS, Wikipedia.
It is installed by default with NumPy/SciPy on macOS and Linux systems at the time of writing.
OpenBLAS is based on the GotoBLAS2 project.
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version
— OpenBLAS: An optimized BLAS library
GotoBLAS2 is a progenitor BLAS library project developed at Texas Advanced Computing Center.
GotoBLAS2 uses new algorithms and memory techniques for optimal performance of the BLAS routines.
— PAST PROJECT: GOTOBLAS2
OpenBLAS was initially developed at the Chinese Academy of Sciences. It is open source and receives contributions from all over the world.
For more information on OpenBLAS see:
Important papers on the library include the following:
- Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor, 2012
- AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs, 2013.
Automatically Tuned Linear Algebra Software (ATLAS)
ATLAS is an acronym that stands for Automatically Tuned Linear Algebra Software.
The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.
— ATLAS Homepage
It is a popular BLAS/LAPACK library on MacOS and Linux, although more recently has been superseded by OpenBLAS.
ATLAS detects the hardware platform on which it is being installed and attempts to compile an optimized BLAS/LAPACK library tailored to the capabilities of the platform to maximize performance.
ATLAS typically uses code generators (i.e., programs that write other programs) in order to provide the many different ways of doing a given operation, and has sophisticated search scripts and robust timing mechanisms in order to find the best ways of performing the operation for a given architecture.
— Automated Empirical Optimization of Software and the ATLAS Project, 2001.
Hence the phrase “Automatically Tuned” in the name of the library.
ATLAS is often recommended as a way to automatically generate an optimized BLAS library. While its performance often trails that of specialized libraries written for one specific hardware platform, it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at Netlib.
— Automatically Tuned Linear Algebra Software, Wikipedia.
For more information on MKL see:
Important papers on the library include the following:
Math Kernel Library (MKL)
MKL is an acronym that stands for Math Kernel Library.
It is a BLAS/LAPACK library developed by Intel, primarily intended to best take advantage of Intel hardware for linear algebra operations.
The fastest and most-used math library for Intel-based systems
— Intel oneAPI Math Kernel Library Homepage.
It is a proprietary library, not open source.
MKL is the default BLAS/LAPACK library installed with NumPy/SciPy on Windows systems.
As of 2020, Intel’s MKL remains the numeric library installed by default along with many pre-compiled mathematical applications on Windows (such as NumPy, SymPy)
— Math Kernel Library, Wikipedia.
For more information on ATLAS see:
Apple Accelerate
Accelerate is a mathematical library developed by Apple primarily for Apple M1 hardware.
Make large-scale mathematical computations and image calculations, optimized for high performance and low energy consumption.
— Apple Accelerate Homepage
It is composed of sub-libraries and does contain support for the BLAS and LAPACK interface.
Perform common linear algebra operations with Apple’s implementation of the Basic Linear Algebra Subprograms (BLAS).
— Apple Accelerate BLAS API.
BLAS and LAPACK support in Accelerate is provided via the vecLib framework.
This library abstracts the vector processing capability so that code written for it will execute appropriate instructions for the processor available at runtime. For this reason, unless you are writing specialized code that targets a single CPU, you should generally use these functions rather than directly using vector instructions
— Apple vecLib homepage
Like MKL, Accelerate is proprietary and not open source. Also like MKL, Accelerate can offer better performance than more general BLAS libraries when used with the targeted hardware, in this case Apple M1 CPUs.
For more information on Accelerate:
BLAS-like Library Instantiation Software (BLIS)
BLIS is an acronym that stands for BLAS-like Library Instantiation Software.
BLIS is an award-winning portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, immediately enable optimized implementations of most of its commonly used and computationally intensive operations.
— BLIS Project on GitHub.
It provides a modern implementation of the BLAS and LAPACK interface, including optimizations and recent versions of C.
It is our belief that BLIS offers substantial benefits in productivity when compared to conventional approaches to developing BLAS libraries, as well as a much-needed refinement of the BLAS interface, and thus constitutes a major advance in dense linear algebra computation.
— BLIS Project on GitHub.
It was initially developed by The University of Texas at Austin and is an open-source project.
BLIS may be more focused on as an educational platform and is currently less popular than other libraries, such as OpenBLAS and MKL.
For more information on ATLAS BLIS:
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
Takeaways
You now know about common BLAS and LAPACK libraries that can be used with NumPy.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Daniel Abadia on Unsplash
Do you have any questions?