3 Multiprocessing Common Errors

June 24, 2022 Python Multiprocessing

You may encounter one among a number of common errors when using the multiprocessing.Process class in Python.

These errors are typically easy to identify and often involve a quick fix.

In this tutorial, you will discover the common errors when creating child processes in Python and how to fix each in turn.

Let's get started.

Common Multiprocessing Errors

The multiprocessing module and multiprocessing.Process class provide a flexible and powerful approach to concurrency using child processes.

When you are getting started with multiprocessing in Python, you may encounter one of many common errors.

These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how new child processes work.

We will take a closer look at some of the more common errors made when creating new child processes; they are:

Do you have an error using the multiprocessing module?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.

Error 1: RuntimeError Starting New Processes

It is common to get a RuntimeError when starting a new Process in Python.

The content of the error often looks as follows:

 An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

This will happen on Windows and MacOS where the default start method is ‘spawn‘. It may also happen when you configure your program to use the ‘spawn‘ start method on other platforms.

This is a common error and is easy to fix.

The fix involves checking if the code is running in the top-level environment and only then, attempt to start a new process.

This is a best practice.

The idiom for this fix, as stated in the message of the RuntimeError, is to use an if-statement and check if the name of the module is equal to the string ‘__main__‘.

For example:

...
# check for top-level environment
if __name__ == '__main__':
	# ...

This is called “protecting the entry point” of the program.

Recall, that __name__ is a variable that refers to the name of the module executing the current code.

Also, recall that ‘__main__‘ is the name of the top-level environment used to execute a Python program.

Using an if-statement to check if the module is the top-level environment and only starting child processes within that block will resolve the RuntimeError.

It means that if the Python file is imported, then the code protected by the if-statement will not run. It will only run when the Python file is run directly, e.g. is the top-level environment.

The if-statement idiom is required, even if the entry point of the program calls a function that itself starts a child process.

You can learn more about this common error in the tutorial:

Error 2: print() Does Not Work In Child Processes

Printing to standard out (stdout) with the built-in print() function may not work property from child processes.

For example, you may print output messages for the user or debug messages from a child process and they may never appear, or may only appear when the child process is terminated.

For example:

...
# report a message from a child process
print('Hello from the child process')

This is a very common situation and the cause is well understood and easy to workaround.

The print() function is a built-in function for displaying messages on standard output or stdout.

When you call print() from a child process created using the ‘spawn‘ start method, the message will not appear.

This is because the messages are block buffered by default and the buffer is not flushed by default after every message. This is unlike the main process that is interactive and will flush messages after each line, e.g. line buffered.

Instead, the buffered messages are only flushed occasionally, such as when the child process terminates and the buffer is garbage collected.

We can flush stdout automatically with each call to print().

This can be achieved by setting the ‘flush‘ argument to True.

For example:

...
# report a message from a child process
print('Hello from the child process', flush=True)

An alternate approach is to call the flush() function on the sys.stdout object directly.

For example:

...
# report a message from a child process
print('Hello from the child process')
# flush output
sys.stdout.flush()

The problem with the print() function only occurs when using the ‘spawn‘ start method.

You can change the start method to ‘fork‘ which will cause print() to work as expected.

Note, the ‘fork‘ start method is not supported on Windows at the time of writing.

You can set the start method via the multiprocessing.set_start_method() function.

For example:

...
# set the start method to fork
set_start_method('fork')

You can learn more about process start methods in the tutorial:

You can learn more about fixing print() from child processes in the tutorial:

Error 3: Adding Attributes to Classes that Extend Process

Python provides the ability to create and manage new processes via the multiprocessing.Process class.

We can extend this class and override the run() function in order to run code in a new child process.

You can learn more about extending the the multiprocessing.Process class in the tutorial:

Extending the multiprocessing.Process and adding attributes that are shared among multiple processes will fail with an error.

For example, if we define a new class that extends the multiprocessing.Process class that sets an attribute on the class instance from the run() method executed in a new child process, then this attribute will not be accessible by other processes, such as the parent process.

This is the case even if both parent and child processes share access to the “same” object.

This is because class instance variables are not shared among processes by default. Instead, instance variables added to the multiprocessing.Process are private to the process that added them.

Each process operates on a serialized copy of the object and any changes made to that object are local to that process only, by default.

If you set class attributes in the child process and try to access them in the parent process or another process, you will get an error.

For example:

Traceback (most recent call last):
  ...
AttributeError: 'CustomProcess' object has no attribute 'data'

This error occurred because the child process operates on a copy of the class instance that is different from the copy of the class instance used in the parent process.

Instance variable attributes can be shared between processes via the multiprocessing.Value and multiprocessing.Array classes.

These classes explicitly define data attributes designed to be shared between processes in a process-safe manner.

Shared variables mean that changes made in one process are always propagated and made available to other processes.

An instance of the multiprocessing.Value can be defined in the constructor of a custom class as a shared instance variable.

The constructor of the multiprocessing.Value class requires that we specify the data type and an initial value.

The data type can be specified using ctype “type” or a typecode.

Typecodes are familiar and easy to use, for example ‘i’ for a signed integer or ‘f’ for a single floating-point value.

For example, we can define a multiprocessing.Value shared memory variable that holds a signed integer and is initialized to the value zero.

...
# initialize an integer shared variable
data = multiprocessing.Value('i', 0)

This can be initialized in the constructor of the class that extends the multiprocessing.Process class.

We can change the value of the shared data variable via the “value” attribute.

For example:

...
# change the value of the shared variable
data.value = 100

We can access the value of the shared data variable via the same “value” attribute.

For example:

...
# access the shared variable
value = data.value

The propagation of changes to the shared variable and mutual exclusion locking of the shared variable is all performed automatically behind the scenes.

You can learn more about this error and how to fix in the tutorial:

Takeaways

You now know about the common errors when using multiprocessing in Python.



If you enjoyed this tutorial, you will love my book: Python Multiprocessing Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.