Last Updated on September 12, 2022
You may encounter one among a number of common errors when using the multiprocessing.Pool in Python.
These errors are often easy to identify and often involve a quick fix.
In this tutorial you will discover the common errors when using multiprocessing pools in Python and how to fix each in turn.
Let’s get started.
Common Errors When Using Multiprocessing Pool
There are a number of common errors when using the multiprocessing.Pool.
These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how the multiprocessing.Pool works.
We will take a closer look at some of the more common errors made when using the multiprocessing.Pool, such as:
- Forgetting __main__
- Using a Function Call in submit()
- Using a Function Call in map()
- Incorrect Function Signature for map()
- Incorrect Function Signature for Callbacks
- Arguments or Shared Data that Does Not Pickle
- Not Flushing print() Statements
Do you have an error using the multiprocessing.Pool?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.
Run loops using all CPUs, download your FREE book to learn how.
Error 1: Forgetting __main__
By far the biggest error when using the multiprocessing Pool is forgetting to protect the entry point, e.g. check for the __main__ module.
Recall that when using processes in Python such as the Process class or the multiprocessing.Pool class we must include a check for the top-level environment. This is specifically the case when using the ‘spawn‘ start method, the default on Win32 and MacOS, but is a good practice anyway.
We can check for the top-level environment by checking if the module name variable __name__ is equal to the string ‘__main__‘.
This indicates that the code is running at the top-level code environment, rather than being imported by a program or script.
For example:
1 2 3 |
# entry point if __name__ == '__main__': # ... |
You can learn more about __main__ more generally here:
Forgetting the main function will result in an error that can be quite confusing.
A complete example of using the multiprocessing.Pool without a check for the __main__ module is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# SuperFastPython.com # example of not having a check for the main top-level environment from time import sleep from multiprocessing import Pool # custom task that will sleep for a variable amount of time def task(value): # block for a moment sleep(1) return value # start the process pool with Pool() as pool: # submit all tasks for result in pool.map(task, range(5)): print(result) |
Running this example will fail with a RuntimeError.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Traceback (most recent call last): ... RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. |
The error message does include information about the need to import an entry point to the program, but also comments on freeze_support which can be confusing for beginners.
This error can be fixed by protecting the entry point of the program with an if-statement:
1 2 |
if __name__ == '__main__': # ... |
You can learn more about this in the tutorial:
Error 2: Using a Function Call in apply_async()
A common error is to call your function when using the apply_async() function.
For example:
1 2 3 |
... # issue the task result = pool.apply_async(task()) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# SuperFastPython.com # example of calling submit with a function call from time import sleep from multiprocessing import Pool # custom function executed in another process def task(): # block for a moment sleep(1) return 'all done' # protect the entry point if __name__ == '__main__': # start the process pool with Pool() as pool: # issue the task result = pool.apply_async(task()) # get the result value = result.get() print(value) |
Running this example will fail with an error.
1 2 3 4 5 6 7 8 9 10 11 12 |
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): ... TypeError: 'str' object is not callable """ The above exception was the direct cause of the following exception: Traceback (most recent call last): ... TypeError: 'str' object is not callable |
You can fix the error by updating the call to apply_async() to take the name of your function and any arguments, instead of calling the function in the call to execute.
For example:
1 2 3 |
... # issue the task result = pool.apply_async(task) |
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
Error 3: Using a Function Call in map()
A common error is to call your function when using the map() function.
For example:
1 2 3 4 |
... # issue all tasks for result in pool.map(task(), range(5)): print(result) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# SuperFastPython.com # example of calling map with a function call from time import sleep from multiprocessing import Pool # custom function executed in another process def task(value): # block for a moment sleep(1) return 'all done' # protect the entry point if __name__ == '__main__': # start the process pool with Pool() as pool: # issue all tasks for result in pool.map(task(), range(5)): print(result) |
Running the example results in a TypeError.
1 2 3 4 |
Traceback (most recent call last): ... for result in pool.map(task(), range(5)): TypeError: task() missing 1 required positional argument: 'value' |
This error can be fixed by changing the call to map() to pass the name of the target task function instead of a call to the function.
1 2 3 4 |
... # issue all tasks for result in pool.map(task, range(5)): print(result) |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Error 4: Incorrect Function Signature for map()
Another common error when using map() is to provide no second argument to the function, e.g. the iterable.
For example:
1 2 3 4 |
... # issue all tasks for result in pool.map(task): print(result) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# SuperFastPython.com # example of calling map without an iterable from time import sleep from multiprocessing import Pool # custom function executed in another process def task(value): # block for a moment sleep(1) return 'all done' # protect the entry point if __name__ == '__main__': # start the process pool with Pool() as pool: # issue all tasks for result in pool.map(task): print(result) |
Running the example does not issue any tasks to the process pool as there was no iterable for the map() function to iterate over.
Running the example results in a TypeError.
1 2 3 |
Traceback (most recent call last): ... TypeError: map() missing 1 required positional argument: 'iterable' |
The fix involves providing an iterable in the call to map() along with your function name.
1 2 3 4 |
... # issue all tasks for result in pool.map(task, range(5)): print(result) |
Error 5: Incorrect Function Signature for Callbacks
Another common error is to forget to include the result in the signature for the callback function when issuing tasks asynchronously.
For example:
1 2 3 |
# result callback function def handler(): print(f'Callback got: {result}', flush=True) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# SuperFastPython.com # example of a callback function for apply_async() from time import sleep from multiprocessing.pool import Pool # result callback function def handler(): print(f'Callback got: {result}', flush=True) # custom function executed in another process def task(): # block for a moment sleep(1) return 'all done' # protect the entry point if __name__ == '__main__': # create and configure the process pool with Pool() as pool: # issue tasks to the process pool result = pool.apply_async(task, callback=handler) # get the result value = result.get() print(value) |
Running this example will result in an error when the callback is called by the process pool.
This will break the process pool and the program will have to be killed manually with a Control-C.
1 2 3 4 |
Exception in thread Thread-3: Traceback (most recent call last): ... TypeError: handler() takes 0 positional arguments but 1 was given |
Fixing this error involves updating the signature of your callback function to include the result from the task.
1 2 3 |
# result callback function def handler(result): print(f'Callback got: {result}', flush=True) |
You can learn more about using callback functions with asynchronous tasks in the tutorial:
This error can also happen with the error callback and forgetting to add the error as an argument in the error callback function.
Error 6: Arguments or Shared Data that Does Not Pickle
A common error is sharing data between processes that cannot be serialized.
Python has a built-in object serialization process called pickle, where objects are pickled or unpickled when serialized and unserialized.
When sharing data between processes, the data will be pickled automatically.
This includes arguments passed to target task functions, data returned from target task functions, and data accessed directly, such as global variables.
If data that is shared between processes cannot be automatically pickled, a PicklingError will be raised.
Most normal Python objects can be pickled.
Examples of objects that cannot pickle are those that might have an open connection, such as to a file, database, server or similar.
We can demonstrate this with an example below that attempts to pass a file handle as an argument to a target task function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of an argument that does not pickle from time import sleep from multiprocessing import Pool # custom function executed in another process def task(file): # write to the file file.write('hi there') return 'all done' # protect the entry point if __name__ == '__main__': # open the file with open('tmp.txt', 'w') as file: # start the process pool with Pool() as pool: # issue the task result = pool.apply_async(task, file) # get the result value = result.get() print(value) |
Running the example, we can see that it falls with an error indicating that the argument cannot be pickled for transmission to the worker process.
1 2 3 |
Traceback (most recent call last): ... TypeError: cannot pickle '_io.TextIOWrapper' object |
This was a contrived example, nevertheless indicative of cases where you cannot pass some active objects to child processes because they cannot be picked.
In general, if you experience this error and you are attempting to pass around a connection or open file, perhaps try to open the connection within the task or use threads instead of processes.
If you experience this type of error with custom data types that are being passed around, you may need to implement code to manually serialize and deserialize your types. I recommend reading the documentation for the pickle module.
Error 7: Not Flushing print() Statements
A common error is to not flush standard out (stdout) when calling the built-in print() statement from target task functions.
By default, the built-in print() statement in Python does not flush output.
You can learn more about the built-in functions here:
The standard output stream (stout) will flush automatically in the main process, often when the internal buffer is full or a new line is detected. This means you see your print statements reported almost immediately after the print function is called in code.
There is a problem when calling the print() function from spawned or forked processes because standard out will buffer output by default.
This means if you call print() from target task functions in the multiprocessing.Pool, you probably will not see the print statements on standard out until the worker processes are closed.
This will be confusing because it will look like your program is not working correctly, e.g. buggy.
The example below demonstrates this with a target task function that will call print() to report some status.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# SuperFastPython.com # example of not flushing output when call print() from tasks in new processes from time import sleep from random import random from multiprocessing import Pool # custom function executed in another process def task(value): # block for a moment sleep(random()) # report a message print(f'Done: {value}') # protect the entry point if __name__ == '__main__': # start the process pool with Pool() as pool: # submit all tasks pool.map(task, range(5)) |
Running the example will wait until all tasks in the process pool have completed before printing all messages on standard out.
1 2 3 4 5 6 |
Done: 0 Done: 1 Done: 2 Done: 3 Done: 4 All done! |
This can be fixed by updating all calls to the print() statement called from target task functions to flush output after each call.
This can be achieved by setting the “flush” argument to True, for example:
1 2 3 |
... # report a message print(f'Done: {value}', flush=True) |
You can learn more about printing from child processes in the tutorial:
More Errors
There are other less common errors that you may encounter.
This section lists some of the less common errors.
Error Sharing Pool With Workers
You may get an error when you attempt to share a process pool with the child workers directly.
1 |
pool objects cannot be passed between processes or pickled |
This can be fixed using a multiprocessing.Manager.
You can learn more about accessing a process pool from within workers in the tutorial:
Error Where Tasks Fail Silently
When issuing tasks to the process pool, they may fail silently and not give any indication of what happened.
A quick fix is to add an error callback function to report any errors, in the case where you are issuing tasks asynchronously.
You can learn more about how to fix tasks failing silently in the tutorial:
Error When Sharing Synchronization Primitives
You may get an error when attempting to share and use a synchronization primitive in the process pool.
This includes:
- Lock
- RLock
- Semaphore
- Barrier
- Condition
- Event
Typically a RuntimeError is raised with an error that may look like the following:
1 |
Condition objects should only be shared between processes through inheritance |
Or:
1 |
Semaphore objects should only be shared between processes through inheritance |
Or:
1 |
Lock objects should only be shared between processes through inheritance |
This error occurs because these synchronization primitive objects cannot be pickled and shared with child worker processes directly.
Instead, you must use a multiprocessing.Manager to create a centralized version of the primitive and share the proxy object that is returned.
You can learn more about how to safely share synchronization primitive objects in the process pool in the tutorials:
- Use a Lock in the Multiprocessing Pool
- Use a Semaphore in the Multiprocessing Pool
- Use an Event in the Multiprocessing Pool
- Use a Condition Variable in the Multiprocessing Pool
- Use a Barrier in the Process Pool
Error When Joining the Process Pool
You may get an error when you attempt to join the process pool by calling join().
The error may look as follows:
1 |
ValueError: Pool is still running |
This error occurs because you attempt to join the process pool while it is still running.
You can fix this error by first closing the pool by calling close() or terminate().
You can learn more about joining the process pool in the tutorial:
Error When Issuing Tasks
You may get an error when issuing tasks to the process pool.
The error may look as follows:
1 |
ValueError: Pool not running |
This error occurs because you have closed the process pool then attempt to issue tasks to execute.
The pool cannot execute tasks if it is not running.
You must start a new pool or issue tasks before closing the pool.
You can learn more about correctly shutting down the process pool in the tutorial:
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know about the common errors when using the multiprocessing.Pool in Python.
Do you have any questions?
Ask your question in the comments below and I will do my best to answer.
Photo by Luis Vidal on Unsplash
MKsPiaNo says
Thank you very much! Contents in ‘Error 6’ saves me!
Jason Brownlee says
You’re very welcome, I’m happy it helped!