Last Updated on October 29, 2022
You can issue tasks to the ThreadPool one-by-one, execute them with threads, and get results in the order that tasks are completed via the imap_unordered() method.
In this tutorial you will discover how to use the imap_unordered() method to issue tasks to the ThreadPool in Python.
Let’s get started.
Problem with ThreadPool imap()
The multiprocessing.pool.ThreadPool in Python provides a pool of reusable threads for executing ad hoc tasks.
A thread pool object which controls a pool of worker threads to which jobs can be submitted.
— multiprocessing — Process-based parallelism
The ThreadPool class extends the Pool class. The Pool class provides a pool of worker processes for process-based concurrency.
Although the ThreadPool class is in the multiprocessing module it offers thread-based concurrency and is best suited to IO-bound tasks, such as reading or writing from sockets or files.
A ThreadPool can be configured when it is created, which will prepare the new threads.
We can issue one-off tasks to the ThreadPool using methods such as apply() or we can apply the same function to an iterable of items using methods such as map().
Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the methods such as apply_async() and map_async().
The built-in map() function allows you to apply a function to each item in an iterable. A problem with this function is that it converts the provided iterable of items into a list and submits all items as tasks to the ThreadPool then blocks until all tasks are complete.
It yields one result returned from the given target function called with one item from a given iterable. It is common to call map() and iterate the results in a for-loop.
The ThreadPool provides a version of the map() function where the target function is called for each item in the provided iterable in parallel. The problem with the ThreadPool map() method is that it converts the provided iterable into a list and issues a task for each item all at once. This can be a problem if the iterable contains many hundreds or thousands of items as it may use a lot of main memory.
As an alternative, the ThreadPool provides the imap() method which is a lazy version of map for applying a target function to each item in an iterable in a lazy manner. Items are yielded from the provided iterable one at a time instead of all at once and results are yielded in order as they are completed rather than after all tasks are completed.
A problem with imap() is that tasks are returned in the order that tasks are completed.
If there are some tasks that are slower than the others to complete, they will hold up the stream of results.
A version of the imap() method is needed that will allow return values to be iterated as fast as tasks are completed. That is, to iterate results as tasks are completed, not the order that tasks are completed.
The imap_unordered() method provides this capability.
Run loops using all CPUs, download your FREE book to learn how.
How to Use ThreadPool imap_unordered()
The ThreadPool provides an unordered version of the imap() method via imap_unordered() method.
The imap_unordered() method takes the name of a function to apply and an iterable.
It will then iterate items in the iterable one at a time and issue a task in the ThreadPool. that calls the specified function on the iterable.
The same as imap() except that the ordering of the results from the returned iterator should be considered arbitrary. (Only when there is only one worker process is the order guaranteed to be “correct”.)
— multiprocessing — Process-based parallelism
The imap_unordered() method then returns an iterable of return values. The return values are yielded in the order that tasks are completed, not the order that the tasks were issued to the ThreadPool.
For example:
1 2 3 4 |
... # apply function to each item in the iterable in parallel for result in pool.imap_unordered(task, items): # ... |
Each item in the iterable is taken as a separate task in the ThreadPool.
Unlike the built-in map() function, the ThreadPool imap_unordered() method only takes one iterable as an argument. This means that the target function executed in the ThreadPool can only take a single argument.
Unlike the ThreadPool map() method, the ThreadPool imap_unordered() method will iterate the provided iterable one item at a time and issue tasks to the ThreadPool. It will also yield return values as tasks are completed rather than all at once after all tasks are completed.
Unlike the imap() method, the imap_unordered() method will yield return values in the order that tasks are completed, not the order that tasks were issued to the ThreadPool.
Although the imap_unordered() method is lazy, we can issue tasks in chunks to the ThreadPool. That is, we can yield a fixed number of items from the input iterable and issue them as one task to be executed by a worker thread.
This can make completing a large number of tasks in a very long iterable more efficient as arguments and return values from the target task function can be transmitted in batches with less computational overhead.
This can be achieved via the “chunksize” argument to imap_unordered().
The chunksize argument is the same as the one used by the map() method. For very long iterables using a large value for chunksize can make the job complete much faster than using the default value of 1.
— multiprocessing — Process-based parallelism
For example:
1 2 3 4 |
... # iterates results from map with chunksize for result in pool.imap_unordered(task, items, chunksize=10): # ... |
Next, let’s take a closer look at how the imap_unordered() method compares to other methods on the ThreadPool.
Difference Between imap_unordered() and imap()
How does the imap_unordered() method compare to the imap() for issuing tasks to the ThreadPool?
The imap_unordered() and imap() methods have a lot in common, such as:
- Both the imap_unordered() and imap() may be used to issue tasks that call a function to all items in an iterable via the ThreadPool.
- Both the imap_unordered() and imap() are lazy versions of the map() method.
- Both the imap_unordered() and imap() methods return an iterable over the return values immediately.
Nevertheless, there is one key difference between the two methods:
- The iterable returned from imap_unordered() yields results in the arbitrary order that tasks are completed, whereas the imap() method yields return values in the order that tasks were submitted.
The imap_unordered() method should be used when the caller needs to iterate return values as tasks are completed, e.g. in any arbitrary order not the order that they were submitted.
The imap() method should be used when the caller needs to iterate return values in the order that they were submitted from tasks as they are completed.
Now that we know how to use the imap_unordered() method to execute tasks in the ThreadPool, let’s look at some worked examples.
Free Python ThreadPool Course
Download your FREE ThreadPool PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPool API.
Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously
Example of ThreadPool imap_unordered()
We can explore how to use the imap_unordered() on the ThreadPool.
In this example, we can define a target task function that takes an integer as an argument, generates a random number, reports the value then returns the value that was generated. We can then call this function for each integer between 0 and 9 using the ThreadPool imap().
This will apply the function to each integer in parallel using as many cores as are available in the system.
Firstly, we can define the target task function.
The function takes an argument, generates a random number between 0 and 1, reports the integer and generated number. It then blocks for a fraction of a second to simulate computational effort, then returns the number that was generated.
The task() function below implements this.
1 2 3 4 5 6 7 8 9 10 |
# task executed in a worker thread def task(identifier): # generate a value value = random() # report a message print(f'Task {identifier} executing with {value}') # block for a moment sleep(value) # return the generated value return value |
We can then create and configure a ThreadPool.
We will use the context manager interface to ensure the pool is shutdown automatically once we are finished with it.
If you are new to the context manager interface of the ThreadPool, you can learn more in the tutorial:
1 2 3 4 |
... # create and configure the thread pool with ThreadPool() as pool: # ... |
We can then call the imap_unordered() method on the ThreadPool to apply our task() function to each value in a range between 0 and 49.
This returns an iterator over return values that will yield results in the order that the tasks are completed.
We can then report the return values directly.
This can all be achieved in a for-loop.
1 2 3 4 |
... # execute tasks in order, thread results out of order for result in pool.imap_unordered(task, range(50)): print(f'Got result: {result}') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of parallel imap_unordered() with the thread pool from random import random from time import sleep from multiprocessing.pool import ThreadPool # task executed in a worker thread def task(identifier): # generate a value value = random() # report a message print(f'Task {identifier} executing with {value}') # block for a moment sleep(value) # return the generated value return value # protect the entry point if __name__ == '__main__': # create and configure the thread pool with ThreadPool() as pool: # execute tasks in order, thread results out of order for result in pool.imap_unordered(task, range(50)): print(f'Got result: {result}') # thread pool is closed automatically |
Running the example first creates the ThreadPool with a default configuration.
It will have one thread worker thread for each logical CPU in your system.
The imap_unordered() method is then called for the range.
This issues 50 calls to the task() function, one for each integer between 0 and 49. An iterator is returned with the result for each function call in an arbitrary order.
Each call to the task function generates a random number between 0 and 1, reports a message, blocks, then returns a value.
The main thread iterates over the values returned from the calls to the task() function as tasks are completed and reports the generated values. The reported values match those in the worker threads.
Importantly, tasks are issued to the ThreadPool one-by-one, as space in the pool becomes available.
As importantly, results in the main thread are reported as tasks are completed, although not in the order that tasks were issued.
A truncated listing of results is provided below. We can see that tasks are running and reporting their generated results while the main thread is receiving and reporting return values.
This is unlike the map() method that must wait for all tasks to complete before reporting return values.
Note that results will differ each time the program is run given the use of random numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
... Task 43 executing with 0.5012562634965123 Got result: 0.5326011817227561 Task 44 executing with 0.9448743784955711 Got result: 0.9830500005815256 Task 45 executing with 0.9462546988157843 Got result: 0.09788807798147703 Task 46 executing with 0.6247042186473099 Got result: 0.774776695048818 Task 47 executing with 0.17334457055690566 Got result: 0.41368323933017437 Task 48 executing with 0.25273902626013867 Got result: 0.17334457055690566 Task 49 executing with 0.2232284829823411 Got result: 0.5012562634965123 Got result: 0.682612034527399 Got result: 0.25273902626013867 Got result: 0.9401905089779249 Got result: 0.7802980706779677 Got result: 0.2232284829823411 Got result: 0.6247042186473099 Got result: 0.9448743784955711 Got result: 0.9462546988157843 |
Next, let’s look at an example of issuing tasks that do not have a return value.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of ThreadPool imap_unordered() with No Return Value
We can explore using the imap_unordered() method to call a function for each item in an iterable that does not have a return value.
This means that we are not interested in the iterable of results returned by the call to imap_unordered() and instead are only interested that all issued tasks get executed.
This can be achieved by updating the previous example so that the task() function does not return a value.
The updated task() function with this change is listed below.
1 2 3 4 5 6 7 8 |
# task executed in a worker thread def task(identifier): # generate a value value = random() # report a message print(f'Task {identifier} executing with {value}') # block for a moment sleep(value) |
Then, in the main thread, we can call imap_unordered() with our task() function and the range, and not iterate the results.
1 2 3 |
... # issue tasks to the thread pool pool.imap_unordered(task, range(50)) |
The call to imap_unordered() will return immediately.
Therefore, we must explicitly wait for all tasks in the ThreadPool to complete. Otherwise, the context manager for the ThreadPool would exit and forcefully terminate the ThreadPool and all running tasks in the pool.
This can be achieved by first closing the ThreadPool so no further tasks can be submitted to the pool.
1 2 3 |
... # shutdown the thread pool pool.close() |
You can learn more about closing the ThreadPool in the tutorial:
We can then join the pool to wait for all tasks to complete and all worker threads to close.
1 2 3 |
... # wait for all issued task to complete pool.join() |
You can learn more about joining the ThreadPool in the tutorial:
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of parallel imap_unordered() with the thread pool and a task that does not return a value from random import random from time import sleep from multiprocessing.pool import ThreadPool # task executed in a worker thread def task(identifier): # generate a value value = random() # report a message print(f'Task {identifier} executing with {value}') # block for a moment sleep(value) # protect the entry point if __name__ == '__main__': # create and configure the thread pool with ThreadPool() as pool: # issue tasks to the thread pool pool.imap_unordered(task, range(50)) # shutdown the thread pool pool.close() # wait for all issued task to complete pool.join() |
Running the example first creates the ThreadPool with a default configuration.
The imap_unordered() method is then called for the range. This issues 50 calls to the task() function, one for each integer between 0 and 49. An iterator is returned immediately with the result for each function call, but is ignored in this case.
The main thread carries on, first closing the ThreadPool then joining it to wait for all tasks to complete.
Each call to the task function generates a random number between 0 and 1, reports a message, then blocks.
The tasks in the pool finish then the ThreadPool is closed.
This example again highlights that the call to imap_unordered() does not block. That it only blocks when we loop over the returned iterator of return values.
Below is a truncated listing of results.
Note that results will differ each time the program is run given the use of random numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
... Task 37 executing with 0.3154780793757277 Task 38 executing with 0.19454440954859287 Task 39 executing with 0.48694686204000004 Task 40 executing with 0.3041450575195668 Task 41 executing with 0.5785036957109727 Task 42 executing with 0.703658768199778 Task 43 executing with 0.03145758860303549 Task 44 executing with 0.7320478300445873 Task 45 executing with 0.8274506405813656 Task 46 executing with 0.2806750751235615 Task 47 executing with 0.019725939989997054 Task 48 executing with 0.9636321908528617 Task 49 executing with 0.43886293893314765 |
Next, let’s explore the chunksize argument to the imap_unordered() method.
Example of ThreadPool imap_unordered() with chunksize
The imap_unordered() method will apply a function to each item in an iterable one-by-one.
If the iterable has a large number of items, it may be inefficient as each task must retrieve the input from the provided iterable and be serialized to be sent to and executed by a worker thread.
A more efficient approach for very large iterables might be to divide the items in the iterable into chunks and issue chunks of items to each worker thread to which the target function can be applied.
This can be achieved with the “chunksize” argument to the imap_unordered() function.
The example below updates the example to first configure the ThreadPool with 4 worker threads, then to issue 40 tasks with 2 tasks per worker thread. That is, there will be 20 chunks sent into the ThreadPool, where each chunk involves two function calls to the target task function with two return values sent back.
Firstly, we can configure the ThreadPool.
1 2 3 4 |
... # create and configure the thread pool with ThreadPool(4) as pool: # ... |
Next, we can issue 40 tasks, with 2 tasks assigned to each worker via the “chunksize” argument.
Return values in the returned iterator are still yielded one at a time, although behind the scenes two results are available for each chunk that returns.
1 2 3 4 |
... # issue tasks to the thread pool for result in pool.imap_unordered(task, range(40), chunksize=2): print(f'Got result: {result}') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of parallel imap_unordered() with the thread pool with a larger iterable and chunksize from random import random from time import sleep from multiprocessing.pool import ThreadPool # task executed in a worker thread def task(identifier): # generate a value value = random() # report a message print(f'Task {identifier} executing with {value}') # block for a moment sleep(1) # return the generated value return value # protect the entry point if __name__ == '__main__': # create and configure the thread pool with ThreadPool(4) as pool: # issue tasks to the thread pool for result in pool.imap_unordered(task, range(40), chunksize=2): print(f'Got result: {result}') # thread pool is closed automatically |
Running the example first creates the ThreadPool with 4 thread workers.
The imap_unordered() method is then called for the range, issuing 20 tasks to the ThreadPool, each with two function calls to the target task function.
Each call to the task function generates a random number between 0 and 1, reports a message, blocks, then returns a value.
The main thread iterates over the values returned from the calls to the task() function as tasks are completed and reports the generated values. The reported values match those in the worker threads.
Importantly, tasks are issued to the ThreadPool two at a time, as space in the pool becomes available.
As importantly, results in the main thread are reported as tasks are completed, although not in the order that tasks were issued.
A truncated listing of results is provided below.
Note that results will differ each time the program is run given the use of random numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
... Got result: 0.8986775438813457 Task 39 executing with 0.19409479942731223 Task 35 executing with 0.2794139007702946 Task 33 executing with 0.003959960799745921 Task 37 executing with 0.18119580525089307 Got result: 0.3744438977483323 Got result: 0.2794139007702946 Got result: 0.4958387634036808 Got result: 0.19409479942731223 Got result: 0.8601011837333837 Got result: 0.18119580525089307 Got result: 0.6441743370368178 Got result: 0.003959960799745921 |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python ThreadPool Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- ThreadPool PDF Cheat Sheet
I also recommend specific chapters from the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPool: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
APIs
References
Takeaways
You now know how to use the imap_unordered() method to issue tasks to the ThreadPool in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Patrick Hendry on Unsplash
Do you have any questions?