Multiprocessing Pool.map_async() in Python
You can call a function for each item in an iterable in parallel and asynchronously via the Pool.map_async() function.
In this tutorial you will discover how to use the map_async() function for the process pool in Python.
Let's get started.
Need a Asynchronous Version of map()
The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.
A process pool can be configured when it is created, which will prepare the child workers.
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
-- multiprocessing — Process-based parallelism
The built-in map() function allows you to apply a function to each item in an iterable.
The Python process pool provides an asynchronous parallel version of the map_async() function.
How can we use the parallel version of map() with the process pool?
How to Use Pool.map_async()
The process pool provides a parallel and asynchronous map function via the Pool.map_async() function.
Recall that the built-in map() function will apply a given function to each item in a given iterable.
Return an iterator that applies function to every item of iterable, yielding the results.
-- Built-in Functions
It yields one result returned from the given target function called with one item from a given iterable. It is common to call map and iterate the results in a for-loop.
For example:
...
# iterates results from map
for result in map(task, items):
# ...
The multiprocessing.pool.Pool process pool provides a version of the map() function where the target function is called for each item in the provided iterable in parallel and the call to map() returns immediately.
The map_async() function does not block while the function is applied to each item in the iterable, instead it returns a AsyncResult object from which the results may be accessed.
A variant of the map() method which returns an AsyncResult object.
-- multiprocessing — Process-based parallelism
For example:
...
# apply the function
result = pool.map_async(task, items)
Each item in the iterable is taken as a separate task in the process pool.
Like the built-in map() function, the returned iterator of results will be in the order of the provided iterable. This means that tasks are issued (and perhaps executed) in the same order as the results are returned.
Unlike the built-in map() function, the map_async() function only takes one iterable as an argument. This means that the target function executed in the process can only take a single argument.
The iterable of items that is passed is iterated in order to issue all tasks to the process pool. Therefore, if the iterable is very long, it may result in many tasks waiting in memory to execute, e.g. one per worker process.
It is possible to split up the items in the iterable evenly to worker processes.
For example, if we had a process pool with 4 child worker processes and an iterable with 40 items, we can split up the items into 4 chunks of 10 items, with one chunk allocated to each worker process.
The effect is less overhead in transmitting tasks to worker processes and collecting results.
This can be achieved via the "chunksize" argument to map().
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
-- multiprocessing — Process-based parallelism
For example:
...
# apply the function with a chunksize
result = pool.map_async(task, items, chunksize=10)
Because map_async() does not block, it allows the caller to continue and retrieve the result when needed.
The result is an iterable of return values which can be accessed via the get() function on the AsyncResult object.
For example:
...
# iterate over return values
for value in result.get():
# ...
A callback function can be called automatically if the task was successful, e.g. no error or exception.
The callback function must take one argument, the result of the target function.
If callback is specified then it should be a callable which accepts a single argument. When the result becomes ready callback is applied to it, that is unless the call failed ...
-- multiprocessing — Process-based parallelism
The function is specified via the "callback" argument to the map_async() function.
For example:
# callback function
def custom_callback(result):
print(f'Got result: {result}')
...
# issue a task asynchronously to the process pool with a callback
result = pool.map_async(task, callback=custom_callback)
Similarly, an error callback function can be specified via the "error_callback" argument that is called only when an unexpected error or exception is raised.
If error_callback is specified then it should be a callable which accepts a single argument. If the target function fails, then the error_callback is called with the exception instance.
-- multiprocessing — Process-based parallelism
The error callback function must take one argument, which is the instance of the error or exception that was raised.
For example:
# error callback function
def custom_error_callback(error):
print(f'Got error: {error}')
...
# issue a task asynchronously to the process pool with an error callback
result = pool.map_async(task, error_callback=custom_error_callback)
Difference Between map_async() vs map()
How does the Pool.map_async() function compare to the Pool.map() for issuing tasks to the process pool?
Both the map_async() and map() may be used to issue tasks that call a function to all items in an iterable via the process pool.
The following summarizes the key differences between these two functions:
- The map_async() function does not block, whereas the map() function does block.
- The map_async() function returns an AsyncResult, whereas the map() function returns an iterable of return values from the target function.
- The map_async() function can execute callback functions on return values and errors, whereas the map() function does not support callback functions.
The map_async() function should be used for issuing target task functions to the process pool where the caller cannot or must not block while the task is executing.
The map() function should be used for issuing target task functions to the process pool where the caller can or must block until all function calls are complete.
You can learn more about the Pool.map() function in the tutorial:
Now that we know how to use the map_async() function to execute tasks in the process pool, let's look at some worked examples.
Example of map_async()
We can explore how to use the parallel and asynchronous version of map_async() on the process pool.
In this example, we can define a target task function that takes an integer as an argument, generates a random number, reports the value then returns the value that was generated. We can then call this function for each integer between 0 and 9 using the process pool map().
This will call the function to each integer in parallel using as many cores as are available in the system. We will not block waiting for the result, instead the calls will be issued asynchronously.
Firstly, we can define the target task function.
The function takes an argument, generates a random number between 0 and 1, reports the integer and generated number. It then blocks for a fraction of a second to simulate computational effort, then returns the number that was generated.
The task() function below implements this.
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
We can then create and configure a process pool.
We will use the context manager interface to ensure the pool is shutdown automatically once we are finished with it.
If you are new to the context manager interface of the process pool, you can learn more in the tutorial:
...
# create and configure the process pool
with Pool() as pool:
# ...
We can then call the map_async() function on the process pool to apply our task() function to each value in a range between 0 and 1.
This will return immediately with an AsyncResult object.
...
# issues tasks to process pool
result = pool.map_async(task, range(10))
We will then iterate over the iterable of results accessible via the get() function on the AsyncResult object.
This can be achieved via a for-loop.
...
# iterate results
for result in result.get():
print(f'Got result: {result}', flush=True)
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool
from random import random
from time import sleep
from multiprocessing.pool import Pool
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
result = pool.map_async(task, range(10))
# iterate results
for result in result.get():
print(f'Got result: {result}', flush=True)
# process pool is closed automatically
Running the example first creates the process pool with a default configuration.
It will have one child worker process for each logical CPU in your system.
The map_async() function is then called for the range.
This issues ten calls to the task() function, one for each integer between 0 and 9.
An AsyncResult object is returned immediately, and the main process is free to continue on.
Each call to the task function generates a random number between 0 and 1, reports a message, blocks, then returns a value.
The main process then accesses the iterable of return values. It iterates over the values returned from the calls to the task() function and reports the generated values, matching those generated in each child process.
Importantly, all task() function calls are issued and executed before the iterator of results is made available. We cannot iterate over results as they are completed by the caller.
Task 0 executing with 0.9591677011241552
Task 1 executing with 0.6751309695183123
Task 2 executing with 0.964733986535503
Task 3 executing with 0.6507350936292685
Task 4 executing with 0.11814202984563771
Task 5 executing with 0.22476993835549486
Task 6 executing with 0.4726927932611594
Task 7 executing with 0.240816718684706
Task 8 executing with 0.9659612477576981
Task 9 executing with 0.34282905711762146
Got result: 0.9591677011241552
Got result: 0.6751309695183123
Got result: 0.964733986535503
Got result: 0.6507350936292685
Got result: 0.11814202984563771
Got result: 0.22476993835549486
Got result: 0.4726927932611594
Got result: 0.240816718684706
Got result: 0.9659612477576981
Got result: 0.34282905711762146
Next, let's look at an example where we might call a map for a function with no return value.
Example of map_async with No Return Value
We can explore using the map_async() function to call a function for each item in an iterable that does not have a return value.
This means that we are not interested in the iterable of results returned by the call to map_async() and instead are only interested that all issued tasks get executed.
This can be achieved by updating the previous example so that the task() function does not return a value.
The updated task() function with this change is listed below.
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
Then, in the main process, we can call map_async() with our task() function and the range, as before.
...
# issues tasks to process pool
result = pool.map_async(task, range(10))
Importantly, because the call to map_async() does not block, we need to wait for the issued tasks to complete.
If we do not wait for the tasks to complete, we will exit the context manager for the process pool which will terminate the worker processes and stop the tasks.
This can be achieved by calling the wait() function on the AsyncResult object.
...
# wait for tasks to complete
result.wait()
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool and no return values
from random import random
from time import sleep
from multiprocessing.pool import Pool
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
result = pool.map_async(task, range(10))
# wait for tasks to complete
result.wait()
# process pool is closed automatically
Running the example first creates the process pool with a default configuration.
The map_async() function is then called for the range. This issues ten calls to the task() function, one for each integer between 0 and 9.
An AsyncResult object is returned immediately, and the main process is free to continue on.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks.
The main process waits on the AsyncResult object, blocking until all calls to the task() function are complete.
The tasks finish, and the wait() function returns and the main process is free to carry on.
Task 0 executing with 0.09767265038765405
Task 1 executing with 0.8423646220732373
Task 2 executing with 0.2683579169891991
Task 3 executing with 0.30235156848007594
Task 4 executing with 0.18873223064507716
Task 5 executing with 0.9179039847914507
Task 6 executing with 0.321696880586686
Task 7 executing with 0.034738358929647495
Task 8 executing with 0.6029598871921921
Task 9 executing with 0.4702093267820888
Next, let's look at issuing many tasks over multiple calls, then wait for all tasks in the process pool to complete.
Example of map_async And Wait For All Tasks To Complete
We can explore how to issue many tasks to the process pool via map_async(), and wait for all issued tasks to complete.
This could be achieved by calling map_async() multiple times, and calling the wait() function AsyncResult object after each call.
An alternate approach is to call map_async() multiple times, then wait on the process pool itself for all issued tasks to complete.
In this example, we can update the previous example to call the map_async() twice and ignore the AsyncResult object that is returned.
...
# issues tasks to process pool
_ = pool.map_async(task, range(10))
# issues tasks to process pool
_ = pool.map_async(task, range(11, 20))
We can then explicitly close the process pool to prevent additional tasks being submitted to the pool, then call the join() function to wait for all issued tasks to complete.
...
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
You can learn more about joining the process pool in the tutorial:
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool and wait for tasks to complete
from random import random
from time import sleep
from multiprocessing.pool import Pool
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
_ = pool.map_async(task, range(10))
# issues tasks to process pool
_ = pool.map_async(task, range(11, 20))
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
# process pool is closed automatically
Running the example first creates the process pool with a default configuration.
The map_async() function is then called for the range. This issues ten calls to the task() function, one for each integer between 0 and 9.
The map_async() function is then called again for a different range, issuing ten more calls to the task() function, one for each integer between 11 and 19.
Both calls return immediately, and the AsyncResult object returned is ignored.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks.
The main process then closes the process pool, preventing any additional tasks from being issued. It then joins the process pool, blocking until all issued tasks are completed.
The tasks finish, and the join() function returns and the main process is free to carry on.
Task 0 executing with 0.4083967477943128
Task 1 executing with 0.22881213774806186
Task 2 executing with 0.20280677456216334
Task 3 executing with 0.7878974399978778
Task 4 executing with 0.8230291308920257
Task 5 executing with 0.25418293474337994
Task 6 executing with 0.8801653057256363
Task 7 executing with 0.7412447008853492
Task 8 executing with 0.9861948754251607
Task 9 executing with 0.27439361977553045
Task 11 executing with 0.5391443240653332
Task 12 executing with 0.8818032799778482
Task 13 executing with 0.7377979960652975
Task 14 executing with 0.7219104014326214
Task 15 executing with 0.7521592119464652
Task 16 executing with 0.03226842416981113
Task 17 executing with 0.32102570279532916
Task 18 executing with 0.29505154731167293
Task 19 executing with 0.06776011173046881
Next, let's look at issuing tasks and handle each return value with a callback function.
Example of map_async() with a Callback Function
We can issue tasks to the process pool that return a value and specify a callback function to handle each returned value.
This can be achieved via the "callback" argument.
In this example, we can update the above example so that the task() function generates a value and returns it. We can then define a function to handle the return value, in this case to simply report the value.
Firstly, we can define the function to call to handle the value returned from the function.
The custom_callback() below implements this, taking one argument which is the value returned from the target task function.
# custom callback function
def custom_callback(result):
print(f'Callback got values: {result}')
Next, we can update the task function to generate a random value between 0 and 1. It then reports the value, sleeps, then returns the value that was generated.
The updated task() function with these changes is listed below.
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
Finally, we can update the call to map_async() to specify the callback function via the "callback" argument, giving the name of our custom function.
...
# issues tasks to process pool
_ = pool.map_async(task, range(10), callback=custom_callback)
The main process will then close the process pool and wait for all issued tasks to complete.
...
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool and a callback function
from random import random
from time import sleep
from multiprocessing.pool import Pool
# custom callback function
def custom_callback(result):
print(f'Callback got values: {result}')
# task executed in a worker process
def task(identifier):
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
_ = pool.map_async(task, range(10), callback=custom_callback)
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
Running the example first creates and configures the process pool.
The map_async() function is then called for the range and a return callback function. This issues ten calls to the task() function, one for each integer between 0 and 9.
The main process then closes the process pool and blocks until all tasks complete and all processes in the process pool close.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks.
When all task() functions are finished, the callback function is expected, provided with the iterable of return values.
The iterable is printed directly, showing all return values at once.
Finally, the worker processes are closed and the main process continues on.
Task 0 executing with 1.325437282428954e-05
Task 1 executing with 0.2532468322408108
Task 2 executing with 0.6155702856421394
Task 3 executing with 0.5189590798977227
Task 4 executing with 0.8194167220934903
Task 5 executing with 0.09276265176527032
Task 6 executing with 0.7559874840206455
Task 7 executing with 0.008764281619561776
Task 8 executing with 0.15341138119434683
Task 9 executing with 0.5864779606678762
Callback got values: [1.325437282428954e-05, 0.2532468322408108, 0.6155702856421394, 0.5189590798977227, 0.8194167220934903, 0.09276265176527032, 0.7559874840206455, 0.008764281619561776, 0.15341138119434683, 0.5864779606678762]
Next, let's look at an example of issuing tasks to the pool with an error callback function.
Example of map_async() with an Error Callback Function
We can issue tasks to the process pool that may raise an unhandled exception and specify an error callback function to handle the exception.
This can be achieved via the "error_callback" argument.
In this example, we can update the above example so that the task() function raises an exception for one specific argument. We can then define a function to handle the raised example, in this case to simply report the details of the exception.
Firstly, we can define the function to call to handle an exception raised by the function.
The custom_error_callback() below implements this, taking one argument which is the exception raised by the target task function.
# custom error callback function
def custom_error_callback(error):
print(f'Got an error: {error}')
Next, we can update the task() function so that it raises an exception conditionally for one argument.
# task executed in a worker process
def task(identifier):
# conditionally raise an error
if identifier == 5:
raise Exception('Something bad happened')
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
Finally, we can update the call to map_async() to specify the error callback function via the "custom_error_callback" argument, giving the name of our custom function.
...
# issues tasks to process pool
_ = pool.map_async(task, range(10), error_callback=custom_error_callback)
The main process then closes the process pool and waits for all tasks to complete.
...
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool and an error callback function
from random import random
from time import sleep
from multiprocessing.pool import Pool
# custom error callback function
def custom_error_callback(error):
print(f'Got an error: {error}')
# task executed in a worker process
def task(identifier):
# conditionally raise an error
if identifier == 5:
raise Exception('Something bad happened')
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
_ = pool.map_async(task, range(10), error_callback=custom_error_callback)
# close the process pool
pool.close()
# wait for all tasks to complete and processes to close
pool.join()
Running the example first creates and configures the process pool.
The map_async() function is then called for the range and an error callback function. This issues ten calls to the task() function, one for each integer between 0 and 9.
An AsyncResult object is returned immediately and is ignored, and the main process is free to continue on.
The main process then closes the process pool and blocks waiting for all issued tasks to complete.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks. The task that receives the value of 5 as an argument raises an exception.
All tasks finish executing and the exception callback function is called with the raised exception, reporting a message.
All worker processes are closed and the main process is free to continue on.
Task 0 executing with 0.03150934197494848
Task 1 executing with 0.643593562605176
Task 2 executing with 0.9853263736176907
Task 3 executing with 0.9485085989714329
Task 4 executing with 0.4509036578947475
Task 6 executing with 0.7543363777396663
Task 7 executing with 0.5002508026375327
Task 8 executing with 0.25425999107623964
Task 9 executing with 0.19878042544880747
Got an error: Something bad happened
Next, let's look at an example of issuing tasks to the pool and accessing the results, where one task may raise an exception.
Example of map_async() with Exception
It is possible for the target function to raise an exception.
If this occurs in just one call to the target function, it will prevent all return values from being accessed.
We can demonstrate this with a worked example.
Firstly, we can update the task() function to conditionally raise an exception, in this case if the argument to the function is equal to five.
...
# conditionally raise an error
if identifier == 5:
raise Exception('Something bad happened')
The updated task() function with this change is listed below.
# task executed in a worker process
def task(identifier):
# conditionally raise an error
if identifier == 5:
raise Exception('Something bad happened')
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
We can then call map_async() to call the task() function for each value in the range from 0 to 9.
...
# issues tasks to process pool
result = pool.map_async(task, range(10))
We can then get the iterable of return values.
...
values = result.get()
This may raise an exception if one of the calls failed, so we must protect the call with a try-except pattern.
...
# get the return values
try:
values = result.get()
except Exception as e:
print(f'Failed with: {e}')
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of parallel map_async() with the process pool and handle an exception
from random import random
from time import sleep
from multiprocessing.pool import Pool
# task executed in a worker process
def task(identifier):
# conditionally raise an error
if identifier == 5:
raise Exception('Something bad happened')
# generate a value
value = random()
# report a message
print(f'Task {identifier} executing with {value}', flush=True)
# block for a moment
sleep(value)
# return the generated value
return value
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
result = pool.map_async(task, range(10))
# get the return values
try:
values = result.get()
except Exception as e:
print(f'Failed with: {e}')
Running the example first creates and configures the process pool.
The map_async() function is then called for the range and a return callback function. This issues ten calls to the task() function, one for each integer between 0 and 9.
An AsyncResult object is returned immediately, and the main process is free to continue on.
The main process then blocks, attempting to get the iterable of return values from the AsyncResult object.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks. The task that receives the value of 5 as an argument raises an exception.
The exception is then re-raised in the main process, handled and a message is reported.
The iterable cannot be looped and no return values are reported.
The process pool is then closed automatically by the context manager.
This highlights that a failure in one task can prevent the results from all tasks from being accessed.
Task 0 executing with 0.1344488772185256
Task 1 executing with 0.35031704628177684
Task 2 executing with 0.07798015322560803
Task 3 executing with 0.6928989297769291
Task 4 executing with 0.7039899747497713
Task 6 executing with 0.6661100987500574
Task 7 executing with 0.5095552401022645
Task 8 executing with 0.8919789867356606
Task 9 executing with 0.45701520294789855
Failed with: Something bad happened
Takeaways
You now know how to use the map_async() function for the process pool in Python.
If you enjoyed this tutorial, you will love my book: Python Multiprocessing Pool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.