7 ProcessPoolExecutor Common Errors in Python

Last Updated on September 12, 2022

You may encounter one among a number of common errors when using the ProcessPoolExecutor in Python.

These errors are often easy to identify and often involve a quick fix.

In this tutorial you will discover the common errors when using process pools in Python and how to fix each in turn.

Let’s get started.

Table of Contents

Common Errors When Using ProcessPoolExecutor

There are a number of common errors when using the ProcessPoolExecutor.

These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how the ProcessPoolExecutor works.

We will take a closer look at some of the more common errors made when using the ProcessPoolExecutor, such as:

Forgetting __main__
Using a Function Call in submit()
Using a Function Call in map()
Incorrect Function Signature for map()
Incorrect Function Signature for Future Callbacks
Arguments or Shared Data that Does Not Pickle
Not Flushing print() Statements

Do you have an error using the ProcessPoolExecutor?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.

Run loops using all CPUs, download your FREE book to learn how.

Error 1: Forgetting main

By far the biggest error when using the ProcessPoolExecutor is forgetting to check for the __main__ module.

Recall that when using processes in Python such as the Process class or the ProcessPoolExecutor we must include a check for the top-level environment.

This is achieved by checking if the module name __name__ is equal to the string ‘__main__‘.

This indicates that the code is running at the top-level code environment, rather than being imported by a program or script.

For example:

# entry point

if __name__ == '__main__':

# ...

You can learn more about __main__ more generally here:

__main__ — Top-level code environment

Forgetting the main function will result in an error that can be quite confusing.

A complete example of using the ProcessPoolExecutor without a check for the __main__ module is listed below.

# SuperFastPython.com

# example of not having a check for the main top-level environment

from time import sleep

from random import random

from concurrent.futures import ProcessPoolExecutor

# custom task that will sleep for a variable amount of time

def task(value):

# sleep for less than a second

sleep(random())

return value

# start the process pool

with ProcessPoolExecutor() as executor:

# submit all tasks

for result in executor.map(task, range(5)):

print(result)

Running this example will fail with a RuntimeError.

Traceback (most recent call last):

...

RuntimeError:

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module:

if __name__ == '__main__':

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

Traceback (most recent call last):

...

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

...

UserWarning: resource_tracker: There appear to be 5 leaked semaphore objects to clean up at shutdown

warnings.warn('resource_tracker: There appear to be %d '

The error message does include information about the need to import an entry point to the program, but also comments on freeze_support which can be confusing for beginners as well as a BrokenProcessPool.

This error can be fixed by protecting the entry point of the program with an if-statement:

1 2	if __name__ == '__main__': # ...

You can learn more about this in the tutorial:

Add if name == ‘main’ When Spawning a Child Process

Download Now: Free ProcessPoolExecutor PDF Cheat Sheet

Error 2: Using a Function Call in submit()

A common error is to call your function when using the submit() function.

For example:

...

# submit the task

future = executor.submit(task())

A complete example with this error is listed below.

# SuperFastPython.com

# example of calling submit with a function call

from time import sleep

from random import random

from concurrent.futures import ProcessPoolExecutor

# custom task that will sleep for a variable amount of time

def task():

# sleep for less than a second

sleep(random())

return 'all done'

# entry point

def main():

# start the process pool

with ProcessPoolExecutor() as executor:

# submit the task

future = executor.submit(task())

# get the result

result = future.result()

print(result)

if __name__ == '__main__':

main()

Running this example will fail with an error.

concurrent.futures.process._RemoteTraceback:

"""

Traceback (most recent call last):

...

TypeError: 'str' object is not callable

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

...

TypeError: 'str' object is not callable

You can fix the error by updating the call to submit() to take the name of your function and any arguments, instead of calling the function in the call to submit.

For example:

...

# submit the task

future = executor.submit(task)

Free Python ProcessPoolExecutor Course

Download your FREE ProcessPoolExecutor PDF cheat sheet and get BONUS access to my free 7-day crash course on the ProcessPoolExecutor API.

Discover how to use the ProcessPoolExecutor class including how to configure the number of workers and how to execute tasks asynchronously.

Learn more

Error 3: Using a Function Call in map()

A common error is to call your function when using the map() function.

For example:

...

# submit all tasks

for result in executor.map(task(), range(5)):

print(result)

A complete example with this error is listed below.

# SuperFastPython.com

# example of calling map with a function call

from time import sleep

from random import random

from concurrent.futures import ProcessPoolExecutor

# custom task that will sleep for a variable amount of time

def task(value):

# sleep for less than a second

sleep(random())

return value

# entry point

def main():

# start the process pool

with ProcessPoolExecutor() as executor:

# submit all tasks

for result in executor.map(task(), range(5)):

print(result)

if __name__ == '__main__':

main()

Running the example results in a TypeError

Traceback (most recent call last):

...

TypeError: task() missing 1 required positional argument: 'value'

This error can be fixed by changing the call to map() to pass the name of the target task function instead of a call to the function.

...

# submit all tasks

for result in executor.map(task, range(5)):

print(result)

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Error 4: Incorrect Function Signature for map()

Another common error when using map() is to provide no second argument to the function, e.g. the iterable.

For example:

...

# submit all tasks

for result in executor.map(task):

print(result)

A complete example with this error is listed below.

# SuperFastPython.com

# example of calling map without an iterable

from time import sleep

from random import random

from concurrent.futures import ProcessPoolExecutor

# custom task that will sleep for a variable amount of time

def task(value):

# sleep for less than a second

sleep(random())

return value

# entry point

def main():

# start the process pool

with ProcessPoolExecutor() as executor:

# submit all tasks

for result in executor.map(task):

print(result)

if __name__ == '__main__':

main()

Running the example does not issue any tasks to the process pool as there was no iterable for the map() function to iterate over.

In this case, no output is displayed.

The fix involves providing an iterable in the call to map() along with your function name.

...

# submit all tasks

for result in executor.map(task, range(5)):

print(result)

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

Error 5: Incorrect Function Signature for Future Callbacks

Another common error is to forget to include the Future in the signature for the callback function registered with a Future object.

For example:

...

# callback function to call when a task is completed

def custom_callback():

print('Custom callback was called')

A complete example with this error is listed below.

# SuperFastPython.com

# example of the wrong signature on the callback function

from time import sleep

from concurrent.futures import ProcessPoolExecutor

# callback function to call when a task is completed

def custom_callback():

print('Custom callback was called', flush=True)

# mock task that will sleep for a moment

def work():

sleep(1)

return 'Task is done'

# entry point

def main():

# create a process pool

with ProcessPoolExecutor() as executor:

# execute the task

future = executor.submit(work)

# add the custom callback

future.add_done_callback(custom_callback)

# get the result

result = future.result()

print(result)

if __name__ == '__main__':

main()

Running this example will result in an error when the callback is called by the process pool.

Task is done

exception calling callback for <Future at 0x10a05f190 state=finished returned str>

Traceback (most recent call last):

...

TypeError: custom_callback() takes 0 positional arguments but 1 was given

Fixing this error involves updating the signature of your callback function to include the Future object.

...

# callback function to call when a task is completed

def custom_callback(future):

print('Custom callback was called')

Error 6: Arguments or Shared Data that Does Not Pickle

A common error is sharing data between processes that cannot be serialized.

Python has a built-in object serialization process called pickle, where objects are pickled or unpickled when serialized and unserialized.

When sharing data between processes, the data will be pickled automatically.

This includes arguments passed to target task functions, data returned from target task functions, and data accessed directly, such as global variables.

If data that is shared between processes cannot be automatically pickled, a PicklingError will be raised.

Most normal Python objects can be pickled.

Examples of objects that cannot pickle are those that might have an open connection, such as to a file, database, server or similar.

We can demonstrate this with an example below that attempts to pass a file handle as an argument to a target task function.

# SuperFastPython.com

# example of an argument that does not pickle

from concurrent.futures import ProcessPoolExecutor

# write to a file

def work(file):

file.write('hi there')

return "All done!"

# entry point

def main():

# submit the task

with open('tmp.txt', 'w') as file:

# start the process pool

with ProcessPoolExecutor() as executor:

# submit the task

future = executor.submit(work, file)

# get the result

result = future.result()

print(result)

if __name__ == '__main__':

main()

Running the example, we can see that it falls with an error indicating that the argument cannot be pickled for transmission to the worker process.

concurrent.futures.process._RemoteTraceback:

...

TypeError: cannot pickle '_io.TextIOWrapper' object

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

...

TypeError: cannot pickle '_io.TextIOWrapper' object

This was a contrived example, where the file could be opened within the target task function or the ProcessPoolExecutor could be changed to a ThreadPoolExecutor.

In general, if you experience this error and you are attempting to pass around a connection or open file, perhaps try to open the connection within the task or use threads instead of processes.

If you experience this type of error with custom data types that are being passed around, you may need to implement code to manually serialize and deserialize your types. I recommend reading the documentation for the pickle module.

Error 7: Not Flushing print() Statements

A common error is to not flush standard out (stdout) when calling the built-in print() statement from target task functions.

By default, the built-in print() statement in Python does not flush output.

You can learn more about the built-in functions here:

Python Built-in Functions

The standard output stream (stout) will flush automatically in the main process, often when the internal buffer is full or a new line is detected. This means you see your print statements reported almost immediately after the print function is called in code.

There is a problem when calling the print() function from spawned or forked processes because standard out will buffer output by default.

This means if you call print() from target task functions in the ProcessPoolExecutor, you probably will not see the print statements on standard out until the worker processes are closed.

This will be confusing because it will look like your program is not working correctly, e.g. buggy.

The example below demonstrates this with a target task function that will call print() to report some status.

# SuperFastPython.com

# example of not flushing output when call print() from tasks in new processes

from time import sleep

from random import random

from concurrent.futures import ProcessPoolExecutor

# custom task that will sleep for a moment

def task(value):

sleep(value)

print(f'Done: {value}')

# entry point

def main():

# start the process pool

with ProcessPoolExecutor() as executor:

executor.map(task, range(5))

print('All done!')

if __name__ == '__main__':

main()

Running the example will wait until all tasks in the process pool have completed before printing all messages on standard out.

Done: 0

Done: 1

Done: 2

Done: 3

Done: 4

All done!

This can be fixed by updating all calls to the print() statement called from target task functions to flush output after each call.

This can be achieved by setting the “flush” argument to True, for example:

...

# report progress and flush the stream

print(f'Done: {value}', flush=True)

You can learn more about this in the tutorial:

How to print() from a Child Process in Python

Takeaways

You now know about the common errors when using the ProcessPoolExecutor in Python.

Do you have any questions?
Ask your question in the comments below and I will do my best to answer.

Photo by Nils Nedel on Unsplash

Comments

Michael Gardner says

April 5, 2023 at 11:31 am

Error 8: You are trying to do multiprocessing in a notebook environment like databricks.

I had a multiprocessing application running on databricks and it would work for about 30 minutes before all the pools would break. Not sure the technical reasons but when I deployed it directly to an ec2 instance on aws it worked without issues.

- Jason Brownlee says
  
  April 6, 2023 at 6:28 am
  
  Ouch!
  
  Thank you for sharing Michael.
  
Vincent says

May 20, 2023 at 12:08 am

Wow, Dr. Brownlee’s new avatar looks great! Sunny and wise!

- Jason Brownlee says
  
  May 20, 2023 at 6:21 am
  
  Thanks!

7 ProcessPoolExecutor Common Errors in Python

Common Errors When Using ProcessPoolExecutor

Error 1: Forgetting main

Error 2: Using a Function Call in submit()

Error 3: Using a Function Call in map()

Error 4: Incorrect Function Signature for map()

Error 5: Incorrect Function Signature for Future Callbacks

Error 6: Arguments or Shared Data that Does Not Pickle

Error 7: Not Flushing print() Statements

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn the ProcessPoolExecutor Systematically

Additional menu

Common Errors When Using ProcessPoolExecutor

Error 1: Forgetting __main__

Error 2: Using a Function Call in submit()

Error 3: Using a Function Call in map()

Error 4: Incorrect Function Signature for map()

Error 5: Incorrect Function Signature for Future Callbacks

Error 6: Arguments or Shared Data that Does Not Pickle

Error 7: Not Flushing print() Statements

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Comments

Leave a Reply Cancel reply

Footer

Learn the ProcessPoolExecutor Systematically

Error 1: Forgetting main