Last Updated on September 12, 2022
It is important to follow best practices when using the multiprocessing.Process class in Python.
Best practices allow you to side-step the most common errors and bugs when using thread for concurrent tasks in your programs.
In this tutorial you will discover the best practices when using Python thread pools.
Let’s get started.
Multiprocessing Best Practices
The multiprocessing.Process class is a flexible and powerful tool for executing tasks concurrently in child processes.
Once you know how the multiprocessing.Process works, it is important to review some best practices to consider when bringing thread pools into our Python programs.
To keep things simple, there are five best practices when creating new child processes, they are:
- Use Context Managers
- Use Timeouts When Waiting
- Use Main Module Idiom
- Use Shared ctypes
- Use Pipes and Queues
Let’s get started with the first practice, which is to use the context manager.
Run loops using all CPUs, download your FREE book to learn how.
Tip 1: Use Context Managers
Acquire and release locks using a context manager, wherever possible.
Locks can be acquired manually via a call to acquire() at the beginning of the critical section followed by a call to release() at the end of the critical section.
For example:
1 2 3 4 5 6 |
... # acquire the lock manually lock.acquire() # critical section... # release the lock lock.release() |
This approach should be avoided wherever possible.
Traditionally, it was recommended to always acquire and release a lock in a try-finally structure.
The lock is acquired, the critical section is executed in the try block, and the lock is always released in the finally block.
For example:
1 2 3 4 5 6 7 8 |
... # acquire the lock lock.acquire() try: # critical section... finally: # always release the lock lock.release() |
This was since replaced with the context manager interface that achieves the same thing with less code.
For example:
1 2 3 4 |
... # acquire the lock with lock: # critical section... |
The benefit of the context manager is that the lock is always released as soon as the block is exited, regardless of how it is exited, e.g. normally, a return, an error, or an exception.
This applies to a number of synchronization primitives, such as:
- Acquiring a mutex lock via the multiprocessing.Lock class.
- Acquiring a reentrant mutex lock via the multiprocessing.RLock class.
- Acquiring a semaphore via the multiprocessing.Semaphore class.
- Acquiring a condition via the multiprocessing.Condition class.
The context manager interface is also supported on other multiprocessing utilities, such as:
- Opening a connection via the multiprocessing.connection.Connection class
- Creating a manager via the multiprocessing.Manager class
- Creating a process pool via the multiprocessing.pool.Pool class.
- Creating a listener via the multiprocessing.connection.Listener class.
Tip 2: Use Timeouts When Waiting
Always use a timeout when waiting on a blocking call.
Many calls made on synchronization primitives may block.
For example:
- Waiting to acquire a multiprocessing.Lock via acquire().
- Waiting for a process to terminate via join().
- Waiting to be notified on a multiprocessing.Condition via wait().
And more.
All blocking calls on concurrency primitives take a “timeout” argument and return True if the call was successful or False otherwise.
Do not call a blocking call without a timeout, wherever possible.
For example:
1 2 3 4 |
... # acquire the lock if not lock.acquire(timeout=2*60): # handle failure case... |
This will allow the waiting process to give-up waiting after a fixed time limit and then attempt to rectify the situation, e.g. report an error, force termination, etc.
Free Python Multiprocessing Course
Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.
Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.
Tip 3: Use Main Module Idiom
A Python program that uses multiprocessing should protect the entry point of the program.
This can be achieved by using an if-statement to check that the entry point is the top-level environment.
For example:
1 2 3 4 |
... # check for top-level environment if __name__ == '__main__': # ... |
This will help to avoid a RuntimeError when creating child processes using the ‘spawn‘ start method, the default on Windows and MacOS.
You can learn more about protecting the entry point when using multiprocessing in the tutorial:
Additionally, it is a good practice to add freeze support as the first line of a Python program that uses multiprocessing.
Freezing a Python program is a process that transforms the Python code into C code for packaging and distribution.
When a program is frozen in order to be distributed, some features of Python are not included or disabled by default.
This is for performance and/or security reasons.
One feature that is disabled when freezing a Python program is multiprocessing.
That is, we cannot create new python processes via multiprocessing.Process instances when freezing our program for distribution.
Creating a process in a frozen application results in a RuntimeError.
We can add support for multiprocessing in our program when freezing code via the multiprocessing.freeze_support() function.
For example:
1 2 3 |
... # enable support for multiprocessing multiprocessing.freeze_support() |
This will have no effect on programs that are not frozen.
You can learn more about adding freeze support in the tutorial:
Protecting the entry point and adding freeze support together are referred to as the “main module” idiom when using multiprocessing.
Using this idiom is a best practice when using multiprocessing.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Tip 4: Use Shared ctypes
Processes do not have shared memory.
Instead, shared memory must be simulated using sockets and/or files.
If you need to share simple data variables or arrays of variables between processes, this can be achieved using shared ctypes.
Shared ctypes provide a mechanism to share data safely between processes in a process-safe manner.
You can share ctypes among processes using the multiprocessing.Value and multiprocessing.Array classes.
- The multiprocessing.Value class is used to share a ctype of a given type among multiple processes.
- The multiprocessing.Array class is used to share an array of ctypes of a given type among multiple processes.
Share ctypes provide a simple and easy to use way of sharing data between processes.
For example, a shared ctype value can be defined in a parent process, then shared with multiple child processes. All child processes and the parent process can then safely read and modify the data within the shared value.
This can be useful in a number of use cases, such as:
- A counter shared among multiple processes.
- Returning data from a child process to a parent process.
- Sharing results of computation among processes.
Shared ctypes can only be shared among processes on the one system. For sharing data across processes on
The multiprocessing.Value class will create a shared ctype with a specified data type and initial value.
For example:
1 2 3 |
... # create a value value = multiprocessing.Value(...) |
The first argument defines the data type for the value. It may be a string type code or a Python ctype class. The second argument may be an initial value.
For example, we can define a signed integer type with the ‘i’ type code and an initial value of zero as follows:
1 2 3 |
... # create a integer value variable = multiprocessing.Value('i', 0) |
Once defined, the value can then be shared and used within multiple processes, such as between a parent and a child process.
Internally, the multiprocessing.Value makes use of a multiprocessing.RLock that ensures that access and modification of the data inside the class is mutually exclusive, e.g. process-safe.
This means that only one process at a time can access or change the data within the multiprocessing.Value object.
The data within the multiprocessing.Value object can be accessed via the “value” attribute.
For example:
1 2 3 |
... # get the data data = variable.value |
The data within the multiprocessing.Value can be changed by the same “value” attribute.
For example:
1 2 3 |
... # change the data variable.value = 100 |
You can learn more about using shared ctypes in the tutorial:
Tip 5: Use Pipes and Queues
Processes can share messages with each other directly using pipes or queues.
These are process-safe data structures that allow processes to send or receive pickleable Python objects.
In multiprocessing, a pipe is a connection between two processes in Python.
Python provides a simple pipe in the multiprocessing.Pipe class.
A pipe can be created by calling the constructor of the multiprocessing.Pipe class, which returns two multiprocessing.connection.Connection objects.
For example:
1 2 3 |
... # create a pipe conn1, conn2 = multiprocessing.Pipe() |
Objects can be shared between processes using the Pipe.
The Connection.send() function can be used to send objects from one process to another.
The objects sent must be picklable.
For example:
1 2 3 |
... # send an object conn2.send('Hello world') |
The Connection.recv() function can be used to receive objects in one process sent by another.
The objects received will be automatically un-pickled.
For example:
1 2 3 |
... # receive an object object = conn1.recv() |
You can learn more about multiprocessing pipes in the tutorial:
Python provides a process-safe queue in the multiprocessing.Queue class.
A queue is a data structure on which items can be added by a call to put() and from which items can be retrieved by a call to get().
The multiprocessing.Queue provides a first-in, first-out FIFO queue, which means that the items are retrieved from the queue in the order they were added. The first items added to the queue will be the first items retrieved. This is opposed to other queue types such as last-in, first-out and priority queues.
The multiprocessing.Queue can be used by first creating an instance of the class. This will create an unbounded queue by default, that is, a queue with no size limit.
For example:
1 2 3 |
... # created an unbounded queue queue = multiprocessing.Queue() |
Items can be added to the queue via a call to put(), for example:
1 2 3 |
... # add an item to the queue queue.put(item) |
Items can be retrieved from the queue by calls to get().
For example:
1 2 3 |
... # get an item from the queue item = queue.get() |
You can learn more about multiprocessing queues in the tutorial:
Further Reading
This section provides additional resources that you may find helpful.
Python Multiprocessing Books
- Python Multiprocessing Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Multiprocessing API Cheat Sheet
I would also recommend specific chapters in the books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know some best practices when creating new child processes in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Ilse Orsel on Unsplash
Do you have any questions?