Last Updated on September 29, 2023
You can parallelize numpy tasks with threads in Python because most numpy functions release the global interpreter lock or GIL.
In this tutorial, you will discover that most numpy array functions will release the global interpreter lock.
Let’s get started.
Numpy and the Global Interpreter Lock
NumPy is an array library in Python.
Data from many domains can be represented using arrays of numbers, such as image data and machine learning data.
Arrays can be treated like mathematical vectors and matrices. As such, it is common to perform mathematical operations on numpy arrays.
We may need to perform operations on many numpy arrays (e.g. hundreds or thousands of arrays) or on numpy arrays that are very large (e.g. gigabytes in size). These operations can be performed efficiently using numpy, via vectorization, e.g. operations applied to an entire array or portion of an array at once.
If we are working with many arrays, we may want to perform numpy array operations in parallel using threads.
Generally, Python only allows one thread to run at a time, given the Global Interpreter Lock or GIL. This lock is released in some circumstances, such as performing a computationally intensive operation in a third-party c-library.
Numpy calls C library functions when performing array operations. This raises an important question when it comes to parallelizing Python programs that use numpy:
Does numpy release the GIL when performing operations on arrays?
Run loops using all CPUs, download your FREE book to learn how.
What is the Global Interpreter Lock (GIL)
The internals of the Python interpreter are not thread-safe.
This means that there can be race conditions between multiple threads within a single Python process, potentially resulting in unexpected behavior and corrupt data.
As such, the Python interpreter makes use of a Global Interpreter Lock, or GIL for short, to make instructions executed by the Python interpreter (called Python bytecodes) thread-safe.
The GIL is a programming pattern in the reference Python interpreter called CPython, although similar locks exist in other interpreted languages, such as Ruby. It is a lock in the sense that it uses a synchronization primitive called a mutual exclusion or mutex lock to ensure that only one thread of execution can execute instructions at a time within a Python process.
In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. The GIL prevents race conditions and ensures thread safety.
— Global Interpreter Lock, Python Wiki.
The effect of the GIL is that whenever a thread within a Python program wants to run, it must acquire the lock before executing. This is not a problem for most Python programs that have a single thread of execution, called the main thread.
It can become a problem in multi-threaded Python programs, such as programs that make use of the threading.Thread class, the multiprocessing.pool.ThreadPool class, or the concurrent.futures.ThreadPoolExecutor class.
The lock is explicitly released and re-acquired periodically by each Python thread, specifically after approximately every 100 bytecode instructions executed within the interpreter. This allows other threads within the Python process to run, if present.
The lock is also released in some circumstances, allowing other threads to run.
An important example is when a thread performs an I/O operation, such as reading or writing from an external resource like a file, socket, or device.
The lock is also explicitly released by some third-party Python libraries when performing computationally expensive operations in C-code.
The GIL is a simple and effective solution to thread safety in the Python interpreter, but it has the major downside that full multithreading is not supported by Python.
NumPy Is Not Limited by the GIL
Numpy will release the GIL when performing most operations on arrays.
… Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.
— Python Global Interpreter Lock, Python Wiki.
This includes many operations with numpy arrays, such as:
- Methods on the ndarray objects, like sum()
- Operators such as +, -, *, / and more.
- Math functions such as power(), sqrt() and more.
The API documentation suggests that generally, all operations on arrays will release the GIL, except those that are operating upon arrays of Python objects.
The exceptions are few but important: while a thread is waiting for IO […] python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL.
— Parallel Programming with numpy and scipy
Free Concurrent NumPy Course
Get FREE access to my 7-day email course on concurrent NumPy.
Discover how to configure the number of BLAS threads, how to execute NumPy tasks faster with thread pools, and how to share arrays super fast.
How to Find NumPy Functions That Release the GIL
Not all numpy functions will release the GIL.
Numpy must be compiled on your system with support for releasing the GIL. This can be achieved by setting the NPY_ALLOW_THREADS constant when configuring numpy during compilation.
If NPY_ALLOW_THREADS is defined during compilation, then as long as no object arrays are involved, the Python Global Interpreter Lock (GIL) is released prior to calling the loops.
— NumPy C code explanations
The good news is that this is almost always performed automatically when configuring, compiling, and installing numpy on modern multi-core systems, running modern operating systems such as Windows, MacOS, and Linux.
The NPY_ALLOW_THREADS constant is used throughout the C code used by numpy for array operations.
In fact, there is a hierarchy of related constants for fine-grained control over when the GIL can and should be released, with additional constants such as WITH_THREADS and NPY_NOSMP.
When calling out to a compiled function that may take time to compute (and does not have side-effects for other threads like updated global variables), the GIL should be released so that other Python threads can run while the time-consuming calculations are performed. This can be accomplished using two groups of macros. Typically, if one macro in a group is used in a code block, all of them must be used in the same code block. Currently, NPY_ALLOW_THREADS is defined to the python-defined WITH_THREADS constant unless the environment variable NPY_NOSMP is set in which case NPY_ALLOW_THREADS is defined to be 0.
— NumPy C code explanations
You can search the numpy C code base for the constants to confirm that the GIL will be released for a given array operation.
Alternatively, can develop a small program that performs the same operation in one, two, three, and more threads and see if there is a speed-up for the operation that you need to use.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
NumPy Functions Can Run in Parallel with Python Threads
Numpy will release the GIL for most array operations on most systems.
The implication of this is that we can use threads to parallelize our programs that make use of numpy arrays.
This can offer a significant speed-up, especially in programs where we need to perform the same mathematical operation on many arrays.
This also impacts the many open-source libraries that make use of numpy arrays, such as scipy, scikit-learn, pandas, and more.
It also means that we don’t have to use multiprocessing and process-based concurrency to achieve parallelism with numpy.
This is critical, as using process-based parallelism with numpy arrays can be painfully slow in those cases where we need to transmit arrays between processes, up to 4x slower in some cases. This is because each array must be pickled before being transmitted and unpickled at the other end. The cost of this inter-process communication typically overwhelms the benefits of parallel processing with processes.
For more about the cost of transferring data between processes, see the tutorial:
Python processes are also significantly slower to start than threads, up to 40x slower in some cases. This too can add unwanted overhead when using process-based concurrency.
You can learn more about this in the tutorial:
Warning About Multithreaded Numpy Tasks
Python will still acquire and release the GIL in tasks executed by Python threads.
This means that we should perform as much of our tasks with numpy arrays within numpy functions. For example, using vectorized operations and the most specific versions of numpy functions.
For example, use an SVD function via numpy.linalg.svd() rather than performing the sequence of SVD calculations step by step on the arrays.
We may lose some benefits of parallelism in tasks executed by threads that involve a sequence of math functions on a numpy array.
For example, we can make this clear with a small example:
1 2 3 4 5 |
... # add arrays b = a + c # calculate square root result = numpy.sqrt(b) |
In the above case, the GIL would be released for the array addition, then acquired again, then released again for the array square root, then acquired again.
For example:
1 2 3 4 5 6 7 8 |
... # GIL is currently held... # RELEASE GIL b = a + c # ACQUIRE GIL # RELEASE GIL result = numpy.sqrt(b) # ACQUIRE GIL |
Each time the GIL is held, other threads that require it cannot run.
Often there is little we can do about this, other than keep it in mind when designing and optimizing our multithreaded tasks.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Concurrent NumPy in Python, Jason Brownlee (my book!)
Guides
- Concurrent NumPy 7-Day Course
- Which NumPy Functions Are Multithreaded
- Numpy Multithreaded Matrix Multiplication (up to 5x faster)
- NumPy vs the Global Interpreter Lock (GIL)
- ThreadPoolExecutor Fill NumPy Array (3x faster)
- Fastest Way To Share NumPy Array Between Processes
Documentation
- Parallel Programming with numpy and scipy, SciPi Cookbook, 2015
- Parallel Programming with numpy and scipy (older archived version)
- Parallel Random Number Generation, NumPy API
NumPy APIs
Concurrency APIs
- threading — Thread-based parallelism
- multiprocessing — Process-based parallelism
- concurrent.futures — Launching parallel tasks
Takeaways
You now know that most numpy array functions will release the global interpreter lock.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Mediocre Studio on Unsplash
Ivan says
Hello, I don’t see that Numpy, installed on modern Ubuntu 22.04, releases GIL.
Take a look at the following paste:
https://pastebin.com/MrDVN6Xb
try changing a threading function between:
runt, runt2, cv_resize
While runt (numba nogil), and cv_resize really scale, runt2 which implements numpy pairwise multiplication and summary doesn’t scale.
Ivan says
I was wrong, it does if changing sum to np.sum.
Jason Brownlee says
Happy to hear that.