NumPy makes use of BLAS and LAPACK libraries to execute linear algebra functions with vectors and matrices efficiently, allowing NumPy to make the best use of available system hardware.

Modern BLAS library implementations like OpenBLAS allow NumPy to perform vector and matrix operations like SVD, matrix multiplication, and least squares efficiently and seamlessly using multiple CPU cores and CPU-specific instruction sets.

In this tutorial, you will discover the BLAS and LAPACK specification, libraries implementations, and their use by NumPy for efficient linear algebra functions.

Let’s get started.

Table of Contents

## What is BLAS

BLAS is an acronym that stands for Basic Linear Algebra Subprograms.

It refers to an open specification for a library of linear algebra functions.

Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication.

— Basic Linear Algebra Subprograms, Wikipedia.

Recall that linear algebra is a field of mathematics concerned with linear equations, typically with scalars (numbers), vectors (arrays of numbers), and matrices (multi-dimensional arrays of numbers).

We typically make use of linear algebra operations when we work with arrays of numbers, such as working with image data or data files. Linear algebra sits at the core of many modern algorithms, such as those used in machine learning and deep learning.

Linear algebra is also used in most sciences and fields of engineering, because it allows modeling many natural phenomena, and computing efficiently with such models. For nonlinear systems, which cannot be modeled with linear algebra, it is often used for dealing with first-order approximations, using the fact that the differential of a multivariate function at a point is the linear map that best approximates the function near that point.

— Linear algebra, Wikipedia.

BLAS is widely used for linear algebra operations in computers, so much so that it has become a de facto standard.

BLAS does provide a reference implementation, called “NetLIB”, although it is not used in practice. Instead, third-party libraries are installed that implement the BLAS specification.

The BLAS library implementations are developed for speed and efficiency. They are implemented in a programming language like C or FORTRAN and use platform-specific tricks such as CPU-specific instructions and multithreading.

Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions.

— Basic Linear Algebra Subprograms, Wikipedia.

Run your loops using all CPUs, download my FREE book to learn how.

## What Are Common BLAS Functions

BLAS provides a suite of linear algebra functions.

The functions are divided into three levels, organized by capability and complexity.

The levels are:

**BLAS Level 1**: Scalar, Vector, and Vector-Vector operations.**BLAS Level 2**: Matrix-Vector operations.**BLAS Level 3**: Matrix-Matrix operations.

Although a given operation belongs to a single level, our application may draw upon functions from across the levels.

The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations.

— BLAS (Basic Linear Algebra Subprograms)

Examples of functions include:

- Multiply a scalar by a vector or a matrix
- Multiply a vector by a vector or a matrix (dot product)
- Multiply a matrix by matrix (dot product)
- Sum vectors or matrices
- Copy a vector or matrix
- Calculate a euclidean norm
- Find the absolute value of a vector or matrix

There are multiple versions of each function, tailored for each common data type, such as single floating-point precision, double floating-point precision, and complex numbers.

## What is LAPACK

The BLAS specification provides the basis for LAPACK.

LAPACK is an acronym that stands for Linear Algebra Package. Like BLAS, LAPACK is a specification that can be implemented efficiently by third-party libraries.

It provides a suite of high-order linear algebra algorithms that we are more likely to use in our programs.

LAPACK (“Linear Algebra Package”) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition.

— LAPACK, Wikipedia.

This includes operations such as:

- Linear equations
- Matrix decompositions (matrix factorization)
- Singular value decomposition (SVD)
- Eigen problems (e.g. eigendecomposition)

Generally, LAPACK focuses on algorithms for working with matrices. As such it is mostly built on top of BLAS 3, e.g. uses BLAS level-3 functions.

LAPACK routines are written so that as much as possible of the computation is performed by calls to the Basic Linear Algebra Subprograms (BLAS). LAPACK is designed at the outset to exploit the Level 3 BLAS – a set of specifications for Fortran subprograms that do various types of matrix multiplication and the solution of triangular systems with multiple right-hand sides. Because of the coarse granularity of the Level 3 BLAS operations, their use promotes high efficiency on many high-performance computers, particularly if specially coded implementations are provided by the manufacturer.

— LAPACK – Linear Algebra PACKage

Implementations of the BLAS specification almost always also provide implementations of the LAPACK specification. As such BLAS and LAPACK are typically bundled together and described simply as BLAS functions.

**Free Concurrent NumPy Course**

Get FREE access to my 7-day email course on concurrent NumPy.

Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.

## Why Does NumPy Use BLAS and LAPACK

NumPy makes use of BLAS and LAPACK functions.

Recall that NumPy is a Python API for working with arrays.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

— NumPy, Wikipedia.

Although NumPy does not require a BLAS library to be installed, if one is installed already or with NumPy, then NumPY will make use of it.

NumPy does not require any external linear algebra libraries to be installed. However, if these are available, NumPy’s setup script can detect them and use them for building.

— Building from source, NumPy.

This allows those operations called by NumPy into BLAS and LAPACK libraries to be very efficient, taking full advantage of the operating system-level and CPU platform-level algorithms and tricks to offer a speed advantage and reduce the computation required.

As such, installing NumPy almost always results in the implementation of a library that implements the BLAS and LAPACK specifications.

Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations.

— Building from source, NumPy.

**Overwheled by the python concurrency APIs?**

Find relief, download my FREE Python Concurrency Mind Maps

## What NumPy and SciPy Functions Use BLAS and LAPACK?

The NumPy API documentation does not provide a list of functions that use BLAS and LAPACK APIs.

Nevertheless, the NumPy API does give some help.

It provides a linear algebra API that lists most, perhaps all, of the NumPy functions that explicitly call the BLAS and LAPACK APIs.

The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient low level implementations of standard linear algebra algorithms. Those libraries may be provided by NumPy itself using C versions of a subset of their reference implementations but, when possible, highly optimized libraries that take advantage of specialized processor functionality are preferred.

— Linear algebra (numpy.linalg)

### NumPy Functions

The list of NumPy functions that call BLAS and LAPACK APIs is as follows:

#### Matrix and vector products

- numpy.dot()
- numpy.linalg.multi_dot(()
- numpy.vdot()
- numpy.inner()
- numpy.outer()
- numpy.matmul()
- numpy.tensordot()
- numpy.einsum(()
- numpy.einsum_path()
- numpy.linalg.matrix_power()
- numpy.kron()

#### Decompositions

- numpy.linalg.cholesky()
- numpy.linalg.qr()
- numpy.linalg.svd()

#### Matrix eigenvalues

- numpy.linalg.eig()
- numpy.linalg.eigh()
- numpy.linalg.eigvals()
- numpy.linalg.eigvalsh()

#### Norms and other numbers

- numpy.linalg.norm()
- numpy.linalg.cond()
- numpy.linalg.det()
- numpy.linalg.matrix_rank()
- numpy.linalg.slogdet()
- numpy.trace()

#### Solving equations and inverting matrices

- numpy.linalg.solve()
- numpy.linalg.tensorsolve()
- numpy.linalg.lstsq()
- numpy.linalg.inv()
- numpy.linalg.pinv()
- numpy.linalg.tensorinv()

You can see a full list of the functions here:

### SciPy Functions

Many of these functions are also provided in SciPy via **scipy.linalg**.

This module also provides some additional functions that may not be implemented by numpy.

You can see the full list of functions provided in the **scipy.linalg** module here:

## What are Common BLAS/LAPACK Implementations

There are many libraries that implement the BLAS specification.

As mentioned, most libraries that implement BLAS also implement the LAPACK specification. This means that only one library needs to be installed in order to use BLAS functions in NumPy.

The BLAS library is typically installed when installing NumPy. This is so that it can expose an API that is suited to NumPy and so that NumPy can be configured at compile time to make the best use of the BLAS library.

At the time of writing, the most common BLAS/LAPACK library is OpenBLAS. This is because it is fast on many modern platforms and because it is open source.

OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types.

— OpenBLAS, Wikipedia.

Other popular BLAS/LAPACK libraries include:

- MKL or Math Kernel Library, a proprietary library by Intel.
- ATLAS or Automatically Tuned Linear Algebra Software open source library.

Wikipedia provides a long list of libraries here:

## Further Reading

This section provides additional resources that you may find helpful.

**Books**

- Concurrent NumPy in Python, Jason Brownlee, 2023 (
**my book**).

**Concurrent NumPy Guides**

- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes

**Concurrent NumPy Documentation**

- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API

**NumPy API**

**Python Concurrency APIs**

- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks

## Takeaways

You now know about the BLAS and LAPACK specification, libraries implementations, and their use by NumPy for efficient linear algebra functions.

**Do you have any questions?**

Ask your questions in the comments below and I will do my best to answer.

Photo by Fikri Rasyid on Unsplash

## Do you have any questions?