Last Updated on September 12, 2022
You may encounter one among a number of common errors when using the ProcessPoolExecutor in Python.
These errors are often easy to identify and often involve a quick fix.
In this tutorial you will discover the common errors when using process pools in Python and how to fix each in turn.
Let’s get started.
Common Errors When Using ProcessPoolExecutor
There are a number of common errors when using the ProcessPoolExecutor.
These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how the ProcessPoolExecutor works.
We will take a closer look at some of the more common errors made when using the ProcessPoolExecutor, such as:
- Forgetting __main__
- Using a Function Call in submit()
- Using a Function Call in map()
- Incorrect Function Signature for map()
- Incorrect Function Signature for Future Callbacks
- Arguments or Shared Data that Does Not Pickle
- Not Flushing print() Statements
Do you have an error using the ProcessPoolExecutor?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.
Run loops using all CPUs, download your FREE book to learn how.
Error 1: Forgetting __main__
By far the biggest error when using the ProcessPoolExecutor is forgetting to check for the __main__ module.
Recall that when using processes in Python such as the Process class or the ProcessPoolExecutor we must include a check for the top-level environment.
This is achieved by checking if the module name __name__ is equal to the string ‘__main__‘.
This indicates that the code is running at the top-level code environment, rather than being imported by a program or script.
For example:
1 2 3 |
# entry point if __name__ == '__main__': # ... |
You can learn more about __main__ more generally here:
Forgetting the main function will result in an error that can be quite confusing.
A complete example of using the ProcessPoolExecutor without a check for the __main__ module is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# SuperFastPython.com # example of not having a check for the main top-level environment from time import sleep from random import random from concurrent.futures import ProcessPoolExecutor # custom task that will sleep for a variable amount of time def task(value): # sleep for less than a second sleep(random()) return value # start the process pool with ProcessPoolExecutor() as executor: # submit all tasks for result in executor.map(task, range(5)): print(result) |
Running this example will fail with a RuntimeError.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Traceback (most recent call last): ... RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. Traceback (most recent call last): ... concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. ... UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' |
The error message does include information about the need to import an entry point to the program, but also comments on freeze_support which can be confusing for beginners as well as a BrokenProcessPool.
This error can be fixed by protecting the entry point of the program with an if-statement:
1 2 |
if __name__ == '__main__': # ... |
You can learn more about this in the tutorial:
Error 2: Using a Function Call in submit()
A common error is to call your function when using the submit() function.
For example:
1 2 3 |
... # submit the task future = executor.submit(task()) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# SuperFastPython.com # example of calling submit with a function call from time import sleep from random import random from concurrent.futures import ProcessPoolExecutor # custom task that will sleep for a variable amount of time def task(): # sleep for less than a second sleep(random()) return 'all done' # entry point def main(): # start the process pool with ProcessPoolExecutor() as executor: # submit the task future = executor.submit(task()) # get the result result = future.result() print(result) if __name__ == '__main__': main() |
Running this example will fail with an error.
1 2 3 4 5 6 7 8 9 10 11 12 |
concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): ... TypeError: 'str' object is not callable """ The above exception was the direct cause of the following exception: Traceback (most recent call last): ... TypeError: 'str' object is not callable |
You can fix the error by updating the call to submit() to take the name of your function and any arguments, instead of calling the function in the call to submit.
For example:
1 2 3 |
... # submit the task future = executor.submit(task) |
Free Python ProcessPoolExecutor Course
Download your FREE ProcessPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ProcessPoolExecutor API.
Discover how to use the ProcessPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.
Error 3: Using a Function Call in map()
A common error is to call your function when using the map() function.
For example:
1 2 3 4 |
... # submit all tasks for result in executor.map(task(), range(5)): print(result) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of calling map with a function call from time import sleep from random import random from concurrent.futures import ProcessPoolExecutor # custom task that will sleep for a variable amount of time def task(value): # sleep for less than a second sleep(random()) return value # entry point def main(): # start the process pool with ProcessPoolExecutor() as executor: # submit all tasks for result in executor.map(task(), range(5)): print(result) if __name__ == '__main__': main() |
Running the example results in a TypeError
1 2 3 |
Traceback (most recent call last): ... TypeError: task() missing 1 required positional argument: 'value' |
This error can be fixed by changing the call to map() to pass the name of the target task function instead of a call to the function.
1 2 3 4 |
... # submit all tasks for result in executor.map(task, range(5)): print(result) |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Error 4: Incorrect Function Signature for map()
Another common error when using map() is to provide no second argument to the function, e.g. the iterable.
For example:
1 2 3 4 |
... # submit all tasks for result in executor.map(task): print(result) |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of calling map without an iterable from time import sleep from random import random from concurrent.futures import ProcessPoolExecutor # custom task that will sleep for a variable amount of time def task(value): # sleep for less than a second sleep(random()) return value # entry point def main(): # start the process pool with ProcessPoolExecutor() as executor: # submit all tasks for result in executor.map(task): print(result) if __name__ == '__main__': main() |
Running the example does not issue any tasks to the process pool as there was no iterable for the map() function to iterate over.
In this case, no output is displayed.
The fix involves providing an iterable in the call to map() along with your function name.
1 2 3 4 |
... # submit all tasks for result in executor.map(task, range(5)): print(result) |
Error 5: Incorrect Function Signature for Future Callbacks
Another common error is to forget to include the Future in the signature for the callback function registered with a Future object.
For example:
1 2 3 4 |
... # callback function to call when a task is completed def custom_callback(): print('Custom callback was called') |
A complete example with this error is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# SuperFastPython.com # example of the wrong signature on the callback function from time import sleep from concurrent.futures import ProcessPoolExecutor # callback function to call when a task is completed def custom_callback(): print('Custom callback was called', flush=True) # mock task that will sleep for a moment def work(): sleep(1) return 'Task is done' # entry point def main(): # create a process pool with ProcessPoolExecutor() as executor: # execute the task future = executor.submit(work) # add the custom callback future.add_done_callback(custom_callback) # get the result result = future.result() print(result) if __name__ == '__main__': main() |
Running this example will result in an error when the callback is called by the process pool.
1 2 3 4 5 |
Task is done exception calling callback for <Future at 0x10a05f190 state=finished returned str> Traceback (most recent call last): ... TypeError: custom_callback() takes 0 positional arguments but 1 was given |
Fixing this error involves updating the signature of your callback function to include the Future object.
1 2 3 4 |
... # callback function to call when a task is completed def custom_callback(future): print('Custom callback was called') |
Error 6: Arguments or Shared Data that Does Not Pickle
A common error is sharing data between processes that cannot be serialized.
Python has a built-in object serialization process called pickle, where objects are pickled or unpickled when serialized and unserialized.
When sharing data between processes, the data will be pickled automatically.
This includes arguments passed to target task functions, data returned from target task functions, and data accessed directly, such as global variables.
If data that is shared between processes cannot be automatically pickled, a PicklingError will be raised.
Most normal Python objects can be pickled.
Examples of objects that cannot pickle are those that might have an open connection, such as to a file, database, server or similar.
We can demonstrate this with an example below that attempts to pass a file handle as an argument to a target task function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# SuperFastPython.com # example of an argument that does not pickle from concurrent.futures import ProcessPoolExecutor # write to a file def work(file): file.write('hi there') return "All done!" # entry point def main(): # submit the task with open('tmp.txt', 'w') as file: # start the process pool with ProcessPoolExecutor() as executor: # submit the task future = executor.submit(work, file) # get the result result = future.result() print(result) if __name__ == '__main__': main() |
Running the example, we can see that it falls with an error indicating that the argument cannot be pickled for transmission to the worker process.
1 2 3 4 5 6 7 8 9 10 |
concurrent.futures.process._RemoteTraceback: ... TypeError: cannot pickle '_io.TextIOWrapper' object """ The above exception was the direct cause of the following exception: Traceback (most recent call last): ... TypeError: cannot pickle '_io.TextIOWrapper' object |
This was a contrived example, where the file could be opened within the target task function or the ProcessPoolExecutor could be changed to a ThreadPoolExecutor.
In general, if you experience this error and you are attempting to pass around a connection or open file, perhaps try to open the connection within the task or use threads instead of processes.
If you experience this type of error with custom data types that are being passed around, you may need to implement code to manually serialize and deserialize your types. I recommend reading the documentation for the pickle module.
Error 7: Not Flushing print() Statements
A common error is to not flush standard out (stdout) when calling the built-in print() statement from target task functions.
By default, the built-in print() statement in Python does not flush output.
You can learn more about the built-in functions here:
The standard output stream (stout) will flush automatically in the main process, often when the internal buffer is full or a new line is detected. This means you see your print statements reported almost immediately after the print function is called in code.
There is a problem when calling the print() function from spawned or forked processes because standard out will buffer output by default.
This means if you call print() from target task functions in the ProcessPoolExecutor, you probably will not see the print statements on standard out until the worker processes are closed.
This will be confusing because it will look like your program is not working correctly, e.g. buggy.
The example below demonstrates this with a target task function that will call print() to report some status.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# SuperFastPython.com # example of not flushing output when call print() from tasks in new processes from time import sleep from random import random from concurrent.futures import ProcessPoolExecutor # custom task that will sleep for a moment def task(value): sleep(value) print(f'Done: {value}') # entry point def main(): # start the process pool with ProcessPoolExecutor() as executor: executor.map(task, range(5)) print('All done!') if __name__ == '__main__': main() |
Running the example will wait until all tasks in the process pool have completed before printing all messages on standard out.
1 2 3 4 5 6 |
Done: 0 Done: 1 Done: 2 Done: 3 Done: 4 All done! |
This can be fixed by updating all calls to the print() statement called from target task functions to flush output after each call.
This can be achieved by setting the “flush” argument to True, for example:
1 2 3 |
... # report progress and flush the stream print(f'Done: {value}', flush=True) |
You can learn more about this in the tutorial:
Further Reading
This section provides additional resources that you may find helpful.
Books
- ProcessPoolExecutor Jump-Start, Jason Brownlee (my book!)
- Concurrent Futures API Interview Questions
- ProcessPoolExecutor PDF Cheat Sheet
I also recommend specific chapters from the following books:
- Effective Python, Brett Slatkin, 2019.
- See Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ProcessPoolExecutor: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python Pool: The Complete Guide
APIs
References
- Thread (computing), Wikipedia.
- Process (computing), Wikipedia.
- Thread Pool, Wikipedia.
- Futures and promises, Wikipedia.
Takeaways
You now know about the common errors when using the ProcessPoolExecutor in Python.
Do you have any questions?
Ask your question in the comments below and I will do my best to answer.
Photo by Nils Nedel on Unsplash
Michael Gardner says
Error 8: You are trying to do multiprocessing in a notebook environment like databricks.
I had a multiprocessing application running on databricks and it would work for about 30 minutes before all the pools would break. Not sure the technical reasons but when I deployed it directly to an ec2 instance on aws it worked without issues.
Jason Brownlee says
Ouch!
Thank you for sharing Michael.
Vincent says
Wow, Dr. Brownlee’s new avatar looks great! Sunny and wise!
Jason Brownlee says
Thanks!