Why Not Always Use Processes in Python

June 22, 2022 Python Multiprocessing

Thread-based concurrency is constructed by the Global Interpreter Lock, whereas process-based concurrency side-steps the GIL completely.

Why not always use process-based concurrency?

Why ever use thread-based concurrency?

In this tutorial you will discover the limitations of process-based concurrency map to the benefits of thread-based concurrency.

Let's get started.

Threads-Based Concurrency is Limited

Thread-based concurrency in Python is limited.

Only a single thread is able to execute at a time.

This is because of the Global Interpreter Lock or GIL that requires that each thread acquire a lock on the interpreter before executing, preventing all other threads executing at the same time.

This means that although we may have tens, hundreds, or even thousands of concurrent threads in our Python application, only one thread may execute in parallel.

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once ...
-- threading — Thread-based parallelism

This is true, with some caveats.

It applies to the Python reference interpreter called CPython, e.g. the version of Python you download from python.org.
It applies to CPU-bound tasks, e.g. tasks that run as fast as your CPU cores will allow.

These caveats matter, and we'll take a closer look at them later.

Nevertheless, the Python API documentation recommends using processes in order to achieve true parallelism.

If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor.
-- threading — Thread-based parallelism

Next, let's take a closer look at process-based concurrency.

Process-Based Concurrency

Process-based concurrency is not limited in the same way as thread-based concurrency.

Both threads and processes can execute concurrently (out of order), but only python processes are able to execute in parallel (simultaneously), not Python threads (with some caveats).

This means that if we want out Python code to run on all CPU cores and make the best use of our system hardware, we should use process-based concurrency.

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
-- multiprocessing — Process-based parallelism

If process-based concurrency offers true parallelism in Python, why not always use processes?

Why ever bother with threads?

Why Use Thread-Based Concurrency

Thread-based concurrency has benefits over process-based concurrency.

Firstly, let's revisit the caveats on the limitations described above regarding thread-based concurrency.

Only one thread can run at a time within a Python process.

True, except...

When Using Alternate Python Interpreters

This limitation only applies to the CPython interpreter and other Python interpreters that implement the Global Interpreter Lock. Some Python interpreters do not implement the GIL, such as Jython and IronPython.

Using one of these other Python interpreters will give you full parallelism using thread-based concurrency.

Maybe you don't want to change your Python interpreter. Fair enough.

When Performing IO-Bound Tasks

Recall that the GIL is a lock on the Python interpreter.

Only one thread can hold this lock at a time, meaning that only one thread can run at a time. Threads are mutually exclusive within the Python interpreter.

Except, the lock is released sometimes, allowing other threads to run.

The GIL is released when a thread is performing an IO-task, such as: interacting with a file, a socket or an external device.

Common examples include:

Hard disk drive: Reading, writing, appending, renaming, deleting, etc. files.
Peripherals: mouse, keyboard, screen, printer, serial, camera, etc.
Internet: Downloading and uploading files, getting a webpage, querying RSS, etc.
Database: Select, update, delete, etc. SQL queries.
Email: Send mail, receive mail, query inbox, etc.

Put bluntly, we can achieve true parallelism with thread-based concurrency when performing IO-bound tasks.

You can learn more about blocking calls in the tutorial:

Thread Blocking Call in Python

When Performing Some CPU-Bound Tasks

Recall, CPU-bound tasks are those computational tasks that will run as fast as the CPU will allow.

They do not interact with any device, file or socket like IO-bound tasks.

Importantly, the GIL may be released by third party libraries for CPU-bound tasks, such as when performing operations known to be safe among threads in the Python interpreter.

Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL.
-- Global Interpreter Lock, Python Wiki.

One example is when calculating cryptographic hashes in the hashlib module.

The Python GIL is released to allow other threads to run while hash updates on data larger than 2047 bytes is taking place when using hash algorithms supplied by OpenSSL.
-- hashlib — Secure hashes and message digests

Another example is in calculating matrix operations using the NumPy library from the SciPy suite.

However, numpy code often releases the GIL while it is calculating, so that simple parallelism can speed up the code.
-- Easy multithreading, SciPy Cookbook

Other examples include when compressing data or files (e.g. zlib) and when working with video and image data (e.g. OpenCV).

Calling system I/O functions is the most common use case for releasing the GIL, but it can also be useful before calling long-running computations which don’t need access to Python objects, such as compression or cryptographic functions operating over memory buffers. For example, the standard zlib and hashlib modules release the GIL when compressing or hashing data.
-- Releasing the GIL from extension code

Threads vs Processes

Processes and process-based concurrency also have limitations compared to threads.

For example:

We may have thousands of threads, but perhaps only tens of processes.
Threads are small and fast, whereas processes are large and slow to create and start.
Threads can share data quickly and directly, whereas processes must pickle and transmit data to each other.

Let's take a closer look at each of these concerns.

Number of Threads vs Number of Processes

The number of child processes that can be created is limited.

For example, you may only be able to create 61 processes on Windows.

On Windows, max_workers must be less than or equal to 61. If it is not then ValueError will be raised. If max_workers is None, then the default chosen will be at most 61, even if more processors are available.
-- concurrent.futures — Launching parallel tasks

The number of threads that can be created is not limited, in practical terms.

We can easily create hundreds or thousands of threads in our Python programs.

This is often required when working with many files, when accessing many URLs or files online, handling many concurrent requests.

Lightweight vs Heavyweight

Both threads and processes are created and managed by the underlying operating system.

A thread belongs to a process. A process may have thousands of threads.

Threads are a lightweight construct.

They have a small memory footprint.
They are fast to allocate and create.
They are fast to start.

Processes are a heavyweight construct.

They have a larger memory footprint, e.g. a process is an instance of the Python interpreter.
They are slow to allocate and create, e.g. fork or spawn start methods are used.
They are slow to start, e.g. the main thread must be created and started.

This means creating, starting, and managing thousands of concurrent tasks, such as requests in a server is well suited to threads and not process-based concurrency.

Shared Memory vs Inter-Process Communication

Concurrency typically requires sharing data or program state between tasks.

Threads operate in the same process and therefore can share data directly within the process.

This means that threads have true shared memory, which is fast, flexible and easy to use.

Processes are separate instances of the Python interpreter.

They do not have access to shared memory, instead they must communicate with each other using inter-process communication (IPC) mechanisms that simulate shared memory.

Examples include sharing data over socket communication or using file-based communication.

Both of these approaches require that all data and program state communicated between processes be serialized (pickled).

This imposes limitations on the data that can be shared between shared (must be pickleable) and a computational overhead to serialize and deserialize all data that is shared.

This means that even if we do use processes for IO-bound tasks, any sharing of data read from an IO source among processes will be significantly slower compared to threads.

Takeaways

You now know why we don't want to always use process-based concurrency on Python.

If you enjoyed this tutorial, you will love my book: Python Multiprocessing Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.