7 Multiprocessing Pool Common Errors in Python
You may encounter one among a number of common errors when using the multiprocessing.Pool in Python.
These errors are often easy to identify and often involve a quick fix.
In this tutorial you will discover the common errors when using multiprocessing pools in Python and how to fix each in turn.
Let's get started.
Common Errors When Using Multiprocessing Pool
There are a number of common errors when using the multiprocessing.Pool.
These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how the multiprocessing.Pool works.
We will take a closer look at some of the more common errors made when using the multiprocessing.Pool, such as:
- Forgetting __main__
- Using a Function Call in submit()
- Using a Function Call in map()
- Incorrect Function Signature for map()
- Incorrect Function Signature for Callbacks
- Arguments or Shared Data that Does Not Pickle
- Not Flushing print() Statements
Do you have an error using the multiprocessing.Pool?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.
Error 1: Forgetting __main__
By far the biggest error when using the multiprocessing Pool is forgetting to protect the entry point, e.g. check for the __main__ module.
Recall that when using processes in Python such as the Process class or the multiprocessing.Pool class we must include a check for the top-level environment. This is specifically the case when using the 'spawn' start method, the default on Win32 and MacOS, but is a good practice anyway.
We can check for the top-level environment by checking if the module name variable __name__ is equal to the string '__main__'.
This indicates that the code is running at the top-level code environment, rather than being imported by a program or script.
For example:
# entry point
if __name__ == '__main__':
# ...
You can learn more about __main__ more generally here:
Forgetting the main function will result in an error that can be quite confusing.
A complete example of using the multiprocessing.Pool without a check for the __main__ module is listed below.
# SuperFastPython.com
# example of not having a check for the main top-level environment
from time import sleep
from multiprocessing import Pool
# custom task that will sleep for a variable amount of time
def task(value):
# block for a moment
sleep(1)
return value
# start the process pool
with Pool() as pool:
# submit all tasks
for result in pool.map(task, range(5)):
print(result)
Running this example will fail with a RuntimeError.
Traceback (most recent call last):
...
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
The error message does include information about the need to import an entry point to the program, but also comments on freeze_support which can be confusing for beginners.
This error can be fixed by protecting the entry point of the program with an if-statement:
if __name__ == '__main__':
# ...
You can learn more about this in the tutorial:
Error 2: Using a Function Call in apply_async()
A common error is to call your function when using the apply_async() function.
For example:
...
# issue the task
result = pool.apply_async(task())
A complete example with this error is listed below.
# SuperFastPython.com
# example of calling submit with a function call
from time import sleep
from multiprocessing import Pool
# custom function executed in another process
def task():
# block for a moment
sleep(1)
return 'all done'
# protect the entry point
if __name__ == '__main__':
# start the process pool
with Pool() as pool:
# issue the task
result = pool.apply_async(task())
# get the result
value = result.get()
print(value)
Running this example will fail with an error.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
...
TypeError: 'str' object is not callable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
TypeError: 'str' object is not callable
You can fix the error by updating the call to apply_async() to take the name of your function and any arguments, instead of calling the function in the call to execute.
For example:
...
# issue the task
result = pool.apply_async(task)
Error 3: Using a Function Call in map()
A common error is to call your function when using the map() function.
For example:
...
# issue all tasks
for result in pool.map(task(), range(5)):
print(result)
A complete example with this error is listed below.
# SuperFastPython.com
# example of calling map with a function call
from time import sleep
from multiprocessing import Pool
# custom function executed in another process
def task(value):
# block for a moment
sleep(1)
return 'all done'
# protect the entry point
if __name__ == '__main__':
# start the process pool
with Pool() as pool:
# issue all tasks
for result in pool.map(task(), range(5)):
print(result)
Running the example results in a TypeError.
Traceback (most recent call last):
...
for result in pool.map(task(), range(5)):
TypeError: task() missing 1 required positional argument: 'value'
This error can be fixed by changing the call to map() to pass the name of the target task function instead of a call to the function.
...
# issue all tasks
for result in pool.map(task, range(5)):
print(result)
Error 4: Incorrect Function Signature for map()
Another common error when using map() is to provide no second argument to the function, e.g. the iterable.
For example:
...
# issue all tasks
for result in pool.map(task):
print(result)
A complete example with this error is listed below.
# SuperFastPython.com
# example of calling map without an iterable
from time import sleep
from multiprocessing import Pool
# custom function executed in another process
def task(value):
# block for a moment
sleep(1)
return 'all done'
# protect the entry point
if __name__ == '__main__':
# start the process pool
with Pool() as pool:
# issue all tasks
for result in pool.map(task):
print(result)
Running the example does not issue any tasks to the process pool as there was no iterable for the map() function to iterate over.
Running the example results in a TypeError.
Traceback (most recent call last):
...
TypeError: map() missing 1 required positional argument: 'iterable'
The fix involves providing an iterable in the call to map() along with your function name.
...
# issue all tasks
for result in pool.map(task, range(5)):
print(result)
Error 5: Incorrect Function Signature for Callbacks
Another common error is to forget to include the result in the signature for the callback function when issuing tasks asynchronously.
For example:
# result callback function
def handler():
print(f'Callback got: {result}', flush=True)
A complete example with this error is listed below.
# SuperFastPython.com
# example of a callback function for apply_async()
from time import sleep
from multiprocessing.pool import Pool
# result callback function
def handler():
print(f'Callback got: {result}', flush=True)
# custom function executed in another process
def task():
# block for a moment
sleep(1)
return 'all done'
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issue tasks to the process pool
result = pool.apply_async(task, callback=handler)
# get the result
value = result.get()
print(value)
Running this example will result in an error when the callback is called by the process pool.
This will break the process pool and the program will have to be killed manually with a Control-C.
Exception in thread Thread-3:
Traceback (most recent call last):
...
TypeError: handler() takes 0 positional arguments but 1 was given
Fixing this error involves updating the signature of your callback function to include the result from the task.
# result callback function
def handler(result):
print(f'Callback got: {result}', flush=True)
You can learn more about using callback functions with asynchronous tasks in the tutorial:
This error can also happen with the error callback and forgetting to add the error as an argument in the error callback function.
Error 6: Arguments or Shared Data that Does Not Pickle
A common error is sharing data between processes that cannot be serialized.
Python has a built-in object serialization process called pickle, where objects are pickled or unpickled when serialized and unserialized.
When sharing data between processes, the data will be pickled automatically.
This includes arguments passed to target task functions, data returned from target task functions, and data accessed directly, such as global variables.
If data that is shared between processes cannot be automatically pickled, a PicklingError will be raised.
Most normal Python objects can be pickled.
Examples of objects that cannot pickle are those that might have an open connection, such as to a file, database, server or similar.
We can demonstrate this with an example below that attempts to pass a file handle as an argument to a target task function.
# SuperFastPython.com
# example of an argument that does not pickle
from time import sleep
from multiprocessing import Pool
# custom function executed in another process
def task(file):
# write to the file
file.write('hi there')
return 'all done'
# protect the entry point
if __name__ == '__main__':
# open the file
with open('tmp.txt', 'w') as file:
# start the process pool
with Pool() as pool:
# issue the task
result = pool.apply_async(task, file)
# get the result
value = result.get()
print(value)
Running the example, we can see that it falls with an error indicating that the argument cannot be pickled for transmission to the worker process.
Traceback (most recent call last):
...
TypeError: cannot pickle '_io.TextIOWrapper' object
This was a contrived example, nevertheless indicative of cases where you cannot pass some active objects to child processes because they cannot be picked.
In general, if you experience this error and you are attempting to pass around a connection or open file, perhaps try to open the connection within the task or use threads instead of processes.
If you experience this type of error with custom data types that are being passed around, you may need to implement code to manually serialize and deserialize your types. I recommend reading the documentation for the pickle module.
Error 7: Not Flushing print() Statements
A common error is to not flush standard out (stdout) when calling the built-in print() statement from target task functions.
By default, the built-in print() statement in Python does not flush output.
You can learn more about the built-in functions here:
The standard output stream (stout) will flush automatically in the main process, often when the internal buffer is full or a new line is detected. This means you see your print statements reported almost immediately after the print function is called in code.
There is a problem when calling the print() function from spawned or forked processes because standard out will buffer output by default.
This means if you call print() from target task functions in the multiprocessing.Pool, you probably will not see the print statements on standard out until the worker processes are closed.
This will be confusing because it will look like your program is not working correctly, e.g. buggy.
The example below demonstrates this with a target task function that will call print() to report some status.
# SuperFastPython.com
# example of not flushing output when call print() from tasks in new processes
from time import sleep
from random import random
from multiprocessing import Pool
# custom function executed in another process
def task(value):
# block for a moment
sleep(random())
# report a message
print(f'Done: {value}')
# protect the entry point
if __name__ == '__main__':
# start the process pool
with Pool() as pool:
# submit all tasks
pool.map(task, range(5))
Running the example will wait until all tasks in the process pool have completed before printing all messages on standard out.
Done: 0
Done: 1
Done: 2
Done: 3
Done: 4
All done!
This can be fixed by updating all calls to the print() statement called from target task functions to flush output after each call.
This can be achieved by setting the "flush" argument to True, for example:
...
# report a message
print(f'Done: {value}', flush=True)
You can learn more about printing from child processes in the tutorial:
More Errors
There are other less common errors that you may encounter.
This section lists some of the less common errors.
Error Sharing Pool With Workers
You may get an error when you attempt to share a process pool with the child workers directly.
pool objects cannot be passed between processes or pickled
This can be fixed using a multiprocessing.Manager.
You can learn more about accessing a process pool from within workers in the tutorial:
Error Where Tasks Fail Silently
When issuing tasks to the process pool, they may fail silently and not give any indication of what happened.
A quick fix is to add an error callback function to report any errors, in the case where you are issuing tasks asynchronously.
You can learn more about how to fix tasks failing silently in the tutorial:
Error When Sharing Synchronization Primitives
You may get an error when attempting to share and use a synchronization primitive in the process pool.
This includes:
- Lock
- RLock
- Semaphore
- Barrier
- Condition
- Event
Typically a RuntimeError is raised with an error that may look like the following:
Condition objects should only be shared between processes through inheritance
Or:
Semaphore objects should only be shared between processes through inheritance
Or:
Lock objects should only be shared between processes through inheritance
This error occurs because these synchronization primitive objects cannot be pickled and shared with child worker processes directly.
Instead, you must use a multiprocessing.Manager to create a centralized version of the primitive and share the proxy object that is returned.
You can learn more about how to safely share synchronization primitive objects in the process pool in the tutorials:
- Use a Lock in the Multiprocessing Pool
- Use a Semaphore in the Multiprocessing Pool
- Use an Event in the Multiprocessing Pool
- Use a Condition Variable in the Multiprocessing Pool
- Use a Barrier in the Process Pool
Error When Joining the Process Pool
You may get an error when you attempt to join the process pool by calling join().
The error may look as follows:
ValueError: Pool is still running
This error occurs because you attempt to join the process pool while it is still running.
You can fix this error by first closing the pool by calling close() or terminate().
You can learn more about joining the process pool in the tutorial:
Error When Issuing Tasks
You may get an error when issuing tasks to the process pool.
The error may look as follows:
ValueError: Pool not running
This error occurs because you have closed the process pool then attempt to issue tasks to execute.
The pool cannot execute tasks if it is not running.
You must start a new pool or issue tasks before closing the pool.
You can learn more about correctly shutting down the process pool in the tutorial:
Takeaways
You now know about the common errors when using the multiprocessing.Pool in Python.
If you enjoyed this tutorial, you will love my book: Python Multiprocessing Pool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.