ThreadPool apply() vs map() vs imap() vs starmap()

September 27, 2022 Python ThreadPool

The ThreadPool provides many ways to issue tasks but no clear guidance on how to choose the best way to issue tasks for your application.

In this tutorial you will discover:

Let's get started.

How to Issue Tasks to the ThreadPool

The ThreadPool class provides a pool of threads that allows tasks to be issued and executed concurrently.

The pool provides 8 ways to issue tasks to workers in the ThreadPool.

They are:

  1. apply()
  2. apply_async()
  3. map()
  4. map_async()
  5. imap()
  6. imap_unordered()
  7. starmap()
  8. starmap_async()

The ThreadPool extends the Pool class and the methods for issuing tasks are defined on the Pool class.

Let's take a closer and brief look at each approach in turn.

How to Use apply()

We can issue one-off tasks to the ThreadPool using the apply() method.

The apply() method takes the name of the function to execute by a worker thread. The call will block until the function is executed by a worker thread, after which time it will return.

For example:

...
# issue a task to the thread pool
pool.apply(task)

The apply() method is a concurrent version of the now deprecated built-in apply() function.

In summary, the capabilities of the apply() method are as follows:

You can learn more about the apply() method in the tutorial:

How to Use apply_async()

We can issue asynchronous one-off tasks to the ThreadPool using the apply_async() method.

Asynchronous means that the call to the ThreadPool does not block, allowing the caller that issued the task to carry on.

The apply_async() method takes the name of the function to execute in a worker thread and returns immediately with a AsyncResult object for the task.

It supports a callback function for the result and an error callback function if an error is raised.

For example:

...
# issue a task asynchronously to the thread pool
result = pool.apply_async(task)

Later the status of the issued task may be checked or retrieved.

For example:

...
# get the result from the issued task
value = result.get()

In summary, the capabilities of the apply_async() method are as follows:

You can learn more about the apply_async() method in the tutorial:

How to Use map()

The ThreadPool provides a concurrent version of the built-in map() function for issuing tasks.

The map() method takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable. It returns an iterable over the return values from each call to the target function.

The iterable is first traversed and all tasks are issued at once. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch.

For example:

...
# iterates return values from the issued tasks
for result in map(task, items):
	# ...

The map() method is a concurrent version of the built-in map() function.

In summary, the capabilities of the map() method are as follows:

You can learn more about the map() method in the tutorial:

How to Use map_async()

The ThreadPool provides an asynchronous version of the map() method for issuing tasks called map_async().

The map_async() method takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable. It does not block and returns immediately with an AsyncResult that may be used to access the results.

The iterable is first traversed and all tasks are issued at once. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch. It supports a callback function for the result and an error callback function if an error is raised.

For example:

...
# issue tasks to the thread pool asynchronously
result = map_async(task, items)

Later the status of the tasks can be checked and the return values from each call to the target function may be iterated.

For example:

...
# iterate over return values from the issued tasks
for value in result.get():
	# ...

In summary, the capabilities of the map_async() method are as follows:

You can learn more about the map_async() method in the tutorial:

How to Use imap()

We can issue tasks to the ThreadPool one-by-one via the imap() method.

The imap() method takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable.

It returns an iterable over the return values from each call to the target function. The iterable will yield return values as tasks are completed, in the order that tasks were issued.

The imap() function is lazy in that it traverses the provided iterable and issues tasks to the ThreadPool one by one as space becomes available in the ThreadPool. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch.

For example:

...
# iterates results as tasks are completed in order
for result in imap(task, items):
	# ...

The imap() method is a concurrent version of the now deprecated itertools.imap() function.

In summary, the capabilities of the imap() method are as follows:

You can learn more about the imap() method in the tutorial:

How to Use imap_unordered()

We can issue tasks to the ThreadPool one-by-one via the imap_unordered() method.

The imap_unordered() method takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable.

It returns an iterable over the return values from each call to the target function. The iterable will yield return values as tasks are completed, in the order that tasks were completed, not the order they were issued.

The imap_unordered() function is lazy in that it traverses the provided iterable and issues tasks to the ThreadPool one by one as space becomes available in the ThreadPool. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch.

For example:

...
# iterates results as tasks are completed, in the order they are completed
for result in imap_unordered(task, items):
	# ...

In summary, the capabilities of the imap_unordered() method are as follows:

You can learn more about the imap_unordered() method in the tutorial:

How to Use starmap()

We can issue multiple tasks to the ThreadPool using the starmap() method.

The starmap() method takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable. Each item in the iterable may itself be an iterable, allowing multiple arguments to be provided to the target function.

It returns an iterable over the return values from each call to the target function. The iterable is first traversed and all tasks are issued at once. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch.

For example:

...
# iterates return values from the issued tasks
for result in starmap(task, items):
	# ...

The starmap() method is a concurrent version of the itertools.starmap() function.

In summary, the capabilities of the starmap() method are as follows:

You can learn more about the starmap() method in the tutorial:

How to Use starmap_async()

We can issue multiple tasks asynchronously to the ThreadPool using the starmap_async() function.

The starmap_async() function takes the name of a target function and an iterable. A task is created to call the target function for each item in the provided iterable. Each item in the iterable may itself be an iterable, allowing multiple arguments to be provided to the target function.

It does not block and returns immediately with an AsyncResult that may be used to access the results.

The iterable is first traversed and all tasks are issued at once. A chunksize can be specified to split the tasks into groups which may be sent to each worker thread to be executed in batch. It supports a callback function for the result and an error callback function if an error is raised.

For example:

...
# issue tasks to the thread pool asynchronously
result = starmap_async(task, items)

Later the status of the tasks can be checked and the return values from each call to the target function may be iterated.

For example:

...
# iterate over return values from the issued tasks
for value in result.get():
	# ...

In summary, the capabilities of the starmap_async() method are as follows:

You can learn more about the starmap_async() method in the tutorial:

How To Choose The Method

There are so many methods to issue tasks to the ThreadPool, how do you choose?

Some properties we may consider when comparing functions used to issue tasks to the ThreadPool include:

The table below summarizes each of these properties and whether they are supported by each call to the ThreadPool.

A YES (green) cell in the table does not mean "good". It means that the function call has a given property which may or may not be useful or required for your specific use case.

How to Issue Tasks to the ThreadPool
How to Issue Tasks to the ThreadPool (click to enlarge)

Let's take a look at each one of these considerations in turn.

One Task vs Multiple Tasks

An important consideration is whether you have one task to issue to the ThreadPool or multiple tasks.

A single task may be issued to the ThreadPool as a call to a target function via the apply() or apply_async() function.

Multiple calls may be issued to the ThreadPool by specifying a target function and an iterable of arguments for each call to the target function. This can be achieved with map(), map_async(), imap(), imap_async(), starmap(), and starmap_async().

Issue Single Task

Issue Multiple Tasks

Blocking vs Non-Blocking

Another important consideration when issuing tasks is whether the function used to issue the task blocks until the tasks are complete or not.

Recall that a blocking call does not return until the call is complete. This means the caller cannot perform any actions until all tasks are issued and finished.

Blocking calls to the ThreadPool include apply(), map(), and starmap().

A non-blocking call returns immediately and provides a hook or mechanism to check the status of the tasks and get the results later. The caller can issue tasks and carry on with the program.

Non-blocking calls to the ThreadPool include apply_async(), map_async(), and starmap_async().

The imap() and imap_unordered() are interesting. They return immediately, so they are technically non-blocking calls. The iterable that is returned will yield return values as tasks are completed. This means traversing the iterable will block.

Blocking Calls

Non-blocking Calls

Lazy vs Non-Lazy

Those calls that issue multiple tasks may operate in one of two ways.

They issue all tasks to the ThreadPool immediately. This means that the provided iterable is traversed and all calls to the target function and yielded arguments are transformed into tasks and held in memory.

We might refer to these functions as non-lazy. They include map(), map_async(), starmap() and starmap_async().

Other functions issue tasks to the ThreadPool one-at-a-time, only as space becomes available in the ThreadPool to execute new tasks. This means that the provided iterable is traversed one item at a time in order to create and issue tasks on demand.

We might refer to these functions as lazy and may be more memory efficient. They include imap() and imap_unordered().

Lazy Calls (one-by-one)

Non-Lazy Calls (all at once)

Single Argument vs Multiple Arguments

The target task function may or may not take arguments.

If it takes arguments, it may take a single argument or more than one argument.

The number of arguments to the target function will limit the functions that may be used to issue tasks.

For example, the apply() and apply_async() function support a target function that takes no arguments.

All of the functions support a target function that takes a single argument.

Only the apply(), apply_async(), starmap(), and starmap_async() functions support target functions with more than one argument.

Target Function With No Arguments

Target Function With One Argument

Target Function with Multiple Arguments

Ordered Results vs Unordered Results

When multiple tasks are issued to the ThreadPool, and the target function returns a value, we may traverse the iterable of return values.

The iterable may yield return values in the order that tasks were issued, e.g. in an order that matches the provided iterable of items to the call, or return values may be returned out of order.

Out of order means that we may not be able to easily relate the return value to the input to the target function.

Most of the calls will return an iterable that yields return values in order, specifically: map(), map_async(), imap(), starmap(), and starmap_async().

Only the imap_unordered() function will return an iterable that yields return values out of order. Specifically, they are yielded in the order that the issued tasks are completed.

Ordered Results

Unordered Results

Result Callbacks vs No Result Callbacks

Some of the calls to issue tasks may support callbacks.

This includes callbacks to handle the return values from the target function or callbacks to handle errors raised while executing the target function.

Those calls that support callbacks include: apply_async(), map_async(), and starmap_async(). These are those calls that are explicitly asynchronous, e.g. non-blocking.

The rest of the calls do not support callbacks, including: apply(), map(), imap(), imap_unordered(), and starmap().

Supports Callbacks

Do Not Support Callbacks

Next, let's directly compare some of the methods used to issue tasks to the ThreadPool.

Compare Methods

Many of the methods used to issue tasks to the ThreadPool have a similar name.

For example:

There are also some more general comparison we might like to make, for example:

To better understand the capability of each method it can be helpful to directly compare and contrast among the methods.

In this section we will briefly look at the differences among methods with like names.

apply() vs apply_async()

Both the apply() and apply_async() may be used to issue one-off tasks to the ThreadPool.

The main differences are as follows:

The apply() function should be used for issuing target task functions to the ThreadPool where the caller can or must block until the task is complete.

The apply_async() function should be used for issuing target task functions to the ThreadPool where the caller cannot or must not block while the task is executing.

map() vs map_async()

Both the map() and map_async() may be used to issue tasks that call a function to all items in an iterable via the ThreadPool.

The main differences are as follows:

The map() function should be used for issuing target task functions to the ThreadPool where the caller can or must block until all function calls are complete.

The map_async() function should be used for issuing target task functions to the ThreadPool where the caller cannot or must not block while the task is executing.

imap() vs imap_unordered()

The imap() and imap_unordered() functions have a lot in common, such as:

Nevertheless, there is one key difference between the two functions:

The imap() function should be used when the caller needs to iterate return values in the order that they were submitted from tasks as they are completed.

The imap_unordered() function should be used when the caller needs to iterate return values in any arbitrary order (not the order that they were submitted) from tasks as they are completed.

starmap() vs starmap_async()

Both the starmap() and starmap_async() may be used to issue tasks that call a function in the ThreadPool with more than one argument.

The main differences are as follows:

The starmap() function should be used for issuing target task functions to the ThreadPool where the caller can or must block until all function calls are complete.

The starmap_async() function should be used for issuing target task functions to the ThreadPool where the caller cannot or must not block while the task is executing.

ThreadPool map() vs built-in map()

Both the ThreadPool map() function and the built-in map() function traverse a provided iterable and execute a target function passing the item from an iterable to the target function.

The main differences are as follows:

The ThreadPool map() function should be used to execute calls to a target function concurrently.

The built-in map() function should be used to execute calls to the target function sequentially and lazily.

ThreadPool starmap() vs itertools.starmap()

Both ThreadPool starmap() and itertools.starmap() functions execute a target function that may have multiple arguments with a provided iterable where each item is an iterable of arguments for each function call.

The main differences are as follows:

The ThreadPool starmap() function should be used to execute calls to a target function with multiple arguments concurrently.

The itertools.starmap() function should be used to execute calls to the target function with multiple arguments sequentially and lazily.

imap() vs map()

Both the imap() and map() may be used to issue tasks that call a function to all items in an iterable via the ThreadPool.

The main differences are as follows:

The imap() function should be used for issuing tasks one-by-one and handling the results for tasks in order as they are available.

The map() function should be used for issuing all tasks at once and handling results in order only once all issued tasks have completed.

Next, let's consider some common questions related to issuing tasks to the ThreadPool.

Common Questions

This section lists common questions about issuing tasks to the ThreadPool and their answers.

Do you have any questions about issuing tasks to the ThreadPool?
Ask your questions in the comments below and I may add them to this section.

How to Issue a Single Task to the ThreadPool?

You can issue a single task to the ThreadPool using the apply() or apply_async() functions.

How to Issue Tasks When the Target Function Has No Arguments?

You can issue tasks to call a target function that has no arguments using the apply() or apply_async() functions.

How to Call map() For a Function With Multiple Arguments?

You can use the starmap() or starmap_async() function to issue tasks to the ThreadPool for a target function that takes multiple arguments.

How to Issue Tasks Asynchronously?

You can issue tasks to the ThreadPool asynchronously using the apply_async(), map_async(), and starmap_async() functions.

Additionally, the imap() and imap_unordered() functions do not block.

Is the imap_unordered() Asynchronous?

No, but kind-of.

The imap_unordered() is not asynchronous in the same way as the apply_async(), map_async(), and starmap_async() functions. Specifically, it does not return an AsyncResult object.

Nevertheless, the imap_unordered() function, like the imap() function, does not block. Instead it returns immediately and only blocks when attempting to retrieve a result from an issued task by traversing the returned iterable.

In this way, the imap_unordered() can be used asynchronously.

Is There a imap_async() Function?

No.

This may be because the imap_unordered() function provided by the API is already asynchronous in that it does not block. However, the imap_unordered() function does not return an AsyncResult object.

Why Ever Use apply()

If a call apply() blocks, why even bother to use it? Why not call the function directly.

The main reason is so that the function call is executed in a separate worker thread.

How Do You Call Many Different Target Functions?

We can call multiple different target functions by using apply() or apply_async(), perhaps in a loop.

We might also call multiple different functions by using multiple separate calls to the map(), map_async(), imap(), imap_unordered(), starmap() and starmap_async() functions.

Why Bother Use imap() Instead of map()

You would use imap() instead of map() so that you can start working with results as they become available instead of blocking and waiting until all results are available.

Also, imap() uses less memory because it lazily traverses the provided input iterable in order to issue tasks as needed.

How Do You Best Set chunksize?

The map(), imap(), and starmap() functions take a "chunksize" argument.

The chunksize specifies how many calls to the target function to group and send to a worker thread in batch.

This can speed-up the overall task by reducing the computational overhead of sending data arguments to worker threads and receiving results from worker threads.

The best value for chunksize depends on your application, specifically on the tasks being executed, how long they take, on the data sent to each task and on the data returned from each task.

You can find an optimal chunksize value with some trial and error and careful benchmark.

Some values to try:

Takeaways

You now know how to issue tasks to the ThreadPool and choose among the various methods to find the best approach for your specific use case.



If you enjoyed this tutorial, you will love my book: Python ThreadPool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.