Last Updated on September 12, 2022
Thread-based concurrency is constructed by the Global Interpreter Lock, whereas process-based concurrency side-steps the GIL completely.
Why not always use process-based concurrency?
Why ever use thread-based concurrency?
In this tutorial you will discover the limitations of process-based concurrency map to the benefits of thread-based concurrency.
Let’s get started.
Threads-Based Concurrency is Limited
Thread-based concurrency in Python is limited.
Only a single thread is able to execute at a time.
This is because of the Global Interpreter Lock or GIL that requires that each thread acquire a lock on the interpreter before executing, preventing all other threads executing at the same time.
This means that although we may have tens, hundreds, or even thousands of concurrent threads in our Python application, only one thread may execute in parallel.
In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once …
— threading — Thread-based parallelism
This is true, with some caveats.
- It applies to the Python reference interpreter called CPython, e.g. the version of Python you download from python.org.
- It applies to CPU-bound tasks, e.g. tasks that run as fast as your CPU cores will allow.
These caveats matter, and we’ll take a closer look at them later.
Nevertheless, the Python API documentation recommends using processes in order to achieve true parallelism.
If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor.
— threading — Thread-based parallelism
Next, let’s take a closer look at process-based concurrency.
Run loops using all CPUs, download your FREE book to learn how.
Process-Based Concurrency
Process-based concurrency is not limited in the same way as thread-based concurrency.
Both threads and processes can execute concurrently (out of order), but only python processes are able to execute in parallel (simultaneously), not Python threads (with some caveats).
This means that if we want out Python code to run on all CPU cores and make the best use of our system hardware, we should use process-based concurrency.
The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
— multiprocessing — Process-based parallelism
If process-based concurrency offers true parallelism in Python, why not always use processes?
Why ever bother with threads?
Why Use Thread-Based Concurrency
Thread-based concurrency has benefits over process-based concurrency.
Firstly, let’s revisit the caveats on the limitations described above regarding thread-based concurrency.
Only one thread can run at a time within a Python process.
True, except…
When Using Alternate Python Interpreters
This limitation only applies to the CPython interpreter and other Python interpreters that implement the Global Interpreter Lock. Some Python interpreters do not implement the GIL, such as Jython and IronPython.
Using one of these other Python interpreters will give you full parallelism using thread-based concurrency.
Maybe you don’t want to change your Python interpreter. Fair enough.
When Performing IO-Bound Tasks
Recall that the GIL is a lock on the Python interpreter.
Only one thread can hold this lock at a time, meaning that only one thread can run at a time. Threads are mutually exclusive within the Python interpreter.
Except, the lock is released sometimes, allowing other threads to run.
The GIL is released when a thread is performing an IO-task, such as: interacting with a file, a socket or an external device.
Common examples include:
- Hard disk drive: Reading, writing, appending, renaming, deleting, etc. files.
- Peripherals: mouse, keyboard, screen, printer, serial, camera, etc.
- Internet: Downloading and uploading files, getting a webpage, querying RSS, etc.
- Database: Select, update, delete, etc. SQL queries.
- Email: Send mail, receive mail, query inbox, etc.
Put bluntly, we can achieve true parallelism with thread-based concurrency when performing IO-bound tasks.
You can learn more about blocking calls in the tutorial:
When Performing Some CPU-Bound Tasks
Recall, CPU-bound tasks are those computational tasks that will run as fast as the CPU will allow.
They do not interact with any device, file or socket like IO-bound tasks.
Importantly, the GIL may be released by third party libraries for CPU-bound tasks, such as when performing operations known to be safe among threads in the Python interpreter.
Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.
— Global Interpreter Lock, Python Wiki.
One example is when calculating cryptographic hashes in the hashlib module.
The Python GIL is released to allow other threads to run while hash updates on data larger than 2047 bytes is taking place when using hash algorithms supplied by OpenSSL.
— hashlib — Secure hashes and message digests
Another example is in calculating matrix operations using the NumPy library from the SciPy suite.
However, numpy code often releases the GIL while it is calculating, so that simple parallelism can speed up the code.
— Easy multithreading, SciPy Cookbook
Other examples include when compressing data or files (e.g. zlib) and when working with video and image data (e.g. OpenCV).
Calling system I/O functions is the most common use case for releasing the GIL, but it can also be useful before calling long-running computations which don’t need access to Python objects, such as compression or cryptographic functions operating over memory buffers. For example, the standard zlib and hashlib modules release the GIL when compressing or hashing data.
— Releasing the GIL from extension code
Free Python Multiprocessing Course
Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.
Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.
Threads vs Processes
Processes and process-based concurrency also have limitations compared to threads.
For example:
- We may have thousands of threads, but perhaps only tens of processes.
- Threads are small and fast, whereas processes are large and slow to create and start.
- Threads can share data quickly and directly, whereas processes must pickle and transmit data to each other.
Let’s take a closer look at each of these concerns.
Number of Threads vs Number of Processes
The number of child processes that can be created is limited.
For example, you may only be able to create 61 processes on Windows.
On Windows, max_workers must be less than or equal to 61. If it is not then ValueError will be raised. If max_workers is None, then the default chosen will be at most 61, even if more processors are available.
— concurrent.futures — Launching parallel tasks
The number of threads that can be created is not limited, in practical terms.
We can easily create hundreds or thousands of threads in our Python programs.
This is often required when working with many files, when accessing many URLs or files online, handling many concurrent requests.
Lightweight vs Heavyweight
Both threads and processes are created and managed by the underlying operating system.
A thread belongs to a process. A process may have thousands of threads.
Threads are a lightweight construct.
- They have a small memory footprint.
- They are fast to allocate and create.
- They are fast to start.
Processes are a heavyweight construct.
- They have a larger memory footprint, e.g. a process is an instance of the Python interpreter.
- They are slow to allocate and create, e.g. fork or spawn start methods are used.
- They are slow to start, e.g. the main thread must be created and started.
This means creating, starting, and managing thousands of concurrent tasks, such as requests in a server is well suited to threads and not process-based concurrency.
Shared Memory vs Inter-Process Communication
Concurrency typically requires sharing data or program state between tasks.
Threads operate in the same process and therefore can share data directly within the process.
This means that threads have true shared memory, which is fast, flexible and easy to use.
Processes are separate instances of the Python interpreter.
They do not have access to shared memory, instead they must communicate with each other using inter-process communication (IPC) mechanisms that simulate shared memory.
Examples include sharing data over socket communication or using file-based communication.
Both of these approaches require that all data and program state communicated between processes be serialized (pickled).
This imposes limitations on the data that can be shared between shared (must be pickleable) and a computational overhead to serialize and deserialize all data that is shared.
This means that even if we do use processes for IO-bound tasks, any sharing of data read from an IO source among processes will be significantly slower compared to threads.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Python Multiprocessing Books
- Python Multiprocessing Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Multiprocessing API Cheat Sheet
I would also recommend specific chapters in the books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know why we don’t want to always use process-based concurrency on Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Chris Barbalis on Unsplash
catchmonster says
Well, let’s agree to disagree…
Threads share process heap but use separate stacks and registers. We can play with words as long as we want, but anyone should step into arena and start writing a code instead. Then come back and tell me, yes threads work beautiful in my app, they never corrupt my data (explicit access to data), i do not have circular dependencies and deadlocks, it is trivial to debug my threads, I designed it so that callbacks will work with locks, and performance of my app is 100X time faster.
I would like to see what anyone delivers, not in executive chat or writing, security bias, but real deal, show the code that runs and is robust and self healing among other things…
We are blessed with cPython and frankly i have zero needs to use threads specifically when 3.11 is coming around the corner … and I do work on petabytes of data in cloud. MP, clustering and workflows utilize my vm’s just fine and I am able to process, transform a huge amounts of data just fine …
Jason Brownlee says
Thanks for sharing.
vincentwu says
“i have zero needs to use threads specifically when 3.11 is coming around the corner”
Care to further explain the reason?
Jason Brownlee says
It might be in reference to sub-interpreters now aiming for 3.12: https://peps.python.org/pep-0554/
Does not negate the need for threads though.