Last Updated on September 12, 2022
You can convert a for-loop to be parallel using the multiprocessing.Pool class.
In this tutorial you will discover how to convert a for-loop to be parallel using the multiprocessing pool.
Let’s get started.
Need to Make For-Loop Parallel
You have a for-loop and you want to execute each iteration in parallel using a separate CPU core.
This is a common situation.
The loop involves performing the same operation multiple times with different data.
For example:
1 2 3 4 |
... # perform the same operation on each item for item in items: # perform operation on item... |
It most commonly involves calling the same function each iteration with different arguments.
For example:
1 2 3 4 5 |
... # call the same function each iteration with different data for item in items: # call function with one data item task(item) |
Alternately, we might want the results from calling the same function multiple times with different data.
This is often achieved using a list comprehension.
For example:
1 2 3 |
... # get results from applying the same function to different data results = [task(item) for item in items] |
Each iteration of the for-loop is performed sequentially. One after the other on one CPU core.
This is a problem because we may have many CPU cores, such as 2, 4, 8 or even 32.
How can we convert a for-loop to be parallel?
Run loops using all CPUs, download your FREE book to learn how.
Prerequisite (prepare your code before we make it parallel)
Before we dive into discovering how to make for-loops parallel, there is a prerequisite.
You will need to refactor your code so that each iteration of your loop calls the same target function each iteration with different arguments.
This means you will need to create a new target function that takes one (or more) arguments with the specific data to operate on each iteration.
For example, your for-loop may look as follows:
1 2 3 4 5 6 |
... # slow for loop executed sequentially for item in items: # do one thing using item # do another thing using item ... |
You can convert into a loop that calls a target function each iteration that takes the unique data each iteration as an argument.
For example:
1 2 3 4 5 6 7 8 |
# task that operates on an item def task(item): # do one thing using item # do another thing using item # slow for loop executed sequentially for item in items: task(item) |
Generally, the function should be self-contained.
It should not use any shared resources to avoid possible concurrency failure modes like race conditions.
It should be a function with no side effects, taking data as arguments and returning any results via return values.
If this is not the case, you must refactor your code so that this is the case, otherwise you may have to contend with mutex locks and such which will make everything a whole lot harder.
Next, let’s look at how we can execute for loops in parallel using all CPU cores.
How To Make For-Loop Parallel With Pool
We can make a for-loop parallel using the multiprocessing pool.
A process pool is a programming pattern for automatically managing a pool of worker processes.
The multiprocessing.Pool class provides a process pool with helpful functions for executing for loops in parallel.
An instance of the Pool class can be created and by default it will use all available CPU cores. Once we are finished with it, we can close it to release the worker processes.
For example:
1 2 3 4 5 6 |
... # create a process pool that uses all cpus pool = multiprocessing.Pool() # ... # close the process pool pool.close() |
We can use the context manager interface if we only need to use the pool for one for-loop. This will ensure the pool is closed for us automatically once we are finished using it, even if a task raises an exception.
For example:
1 2 3 4 |
... # create a process pool that uses all cpus with multiprocessing.Pool() as pool: # ... |
This is the recommended usage of the process pool.
You can learn more about how to use the multiprocessing pool context manager in the tutorial:
The multiprocessing pool provides a parallel version of the built-in map() function that will call the same function for each item in a provided iterable, e.g. same function with different data.
For example:
1 2 3 4 5 |
... # create a process pool that uses all cpus with multiprocessing.Pool() as pool: # call the function for each item in parallel pool.map(task, items) |
This will traverse the “items” iterable and issue one call to the task() function for each item and return once all tasks have completed.
The map() function returns an iterable of return values from the target function which can be stored or iterated directly, if our target function has a return value.
For example:
1 2 3 4 5 6 |
... # create a process pool that uses all cpus with multiprocessing.Pool() as pool: # call the function for each item in parallel for result in pool.map(task, items): print(result) |
You can learn more about how to use the map() function in the tutorial:
The map() function on the multiprocessing pool only takes a single argument.
If our target function takes more than one argument, we can use the starmap() function instead.
It takes an iterable of iterable, where each nested iterable provides arguments for one call to the target task function.
For example:
1 2 3 4 5 6 7 8 |
... # create a process pool that uses all cpus with multiprocessing.Pool() as pool: # prepare arguments for reach call to target function items = [(1,2), (3,4), (5,6)] # call the function for each item in parallel with multiple arguments for result in pool.starmap(task, items): print(result) |
You can learn more about how to use the starmap() function in the tutorial:
A problem with both map() and starmap() is they do not return their iterable of return values until all tasks have completed.
Instead we can use the imap() function. This will only issue tasks to the multiprocessing pool once a worker becomes available, and will yield a return value as tasks are completed.
This is helpful in order to save memory when issuing tasks, if we have thousands or millions of tasks to issue. It is also more responsive as we can process or show results as tasks finish rather than after all tasks have completed.
For example:
1 2 3 4 5 6 |
... # create a process pool that uses all cpus with multiprocessing.Pool() as pool: # call the function for each item in parallel, get results as tasks complete for result in pool.imap(task, items): print(result) |
You can learn more about how to use the imap() function in the tutorial:
This was a crash course in using the multiprocessing pool.
Now that we know how to convert our for-loops to be parallel, let’s look at some worked examples that you can use as templates for your own project.
Free Python Multiprocessing Pool Course
Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.
Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.
Example of Parallel For-Loop with map()
We can explore how to use map() to execute for-loops in parallel.
In this example we will use a simple task function as a placeholder that you can replace with your own task function to be called each loop iteration.
In this case, our task will take one integer argument to the task. It will then generate a random number between 0 and 1, then sleep for a fraction of a simulate work. Finally, it will either return the generated value as a result for the task.
The task() function below implements this.
1 2 3 4 5 6 7 8 |
# task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value |
Let’s dive-in.
Serial For-Loop with map()
Before we dive into parallel for-loops with map(), let’s start with a sequential for-loop.
We can use the built-in map() function to call our task function each iteration.
Each function call will be executed sequentially, one after the other on one CPU core.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# SuperFastPython.com # example of a sequential for loop from time import sleep from random import random # a task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value # entry point for the program if __name__ == '__main__': # call the same function with different data sequentially for result in map(task, range(10)): # report the value to show progress print(result) |
Running the example calls the same task() function for each value in the iterable form 0 to 9.
Different random numbers are generated each time the code is run.
Each call to task() happens sequentially, and the entire example takes about 4-5 seconds, depending on the specific numbers generated.
1 2 3 4 5 6 7 8 9 10 |
0.12199618835333303 0.5212357710994685 0.7417626978125494 0.0450874955439563 0.4708361362483452 0.46524023376998846 0.32858199011122213 0.468263433349433 0.3856435694592504 0.995716230169814 |
Next, let’s look at converting the example to execute in parallel using the multiprocessing pool.
Parallel For-Loop with map()
We can update the previous example so that each call to the task() function is executed in parallel using all available CPU cores.
For example, if we have 4 CPU cores, this might be seen as 8 logical CPU cores after hyperthreading. This means our system would be able to execute 8 calls to task() in parallel, then as calls complete the remaining calls to task() will be executed.
First, we can create the multiprocessing pool configured by default to use all available CPU cores.
1 2 3 4 |
... # create the process pool with Pool() as pool: # ... |
Next, we can call the map() function as before and iterate the result values, although this time the map() function is a method on the multiprocessing pool.
1 2 3 4 5 |
... # call the same function with different data in parallel for result in pool.map(task, range(10)): # report the value to show progress print(result) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# SuperFastPython.com # example of a parallel for loop from time import sleep from random import random from multiprocessing import Pool # task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value # entry point for the program if __name__ == '__main__': # create the process pool with Pool() as pool: # call the same function with different data in parallel for result in pool.map(task, range(10)): # report the value to show progress print(result) |
Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.
All ten calls to the task() function are issued to the process pool and it completes them as fast as it is able.
Once all tasks are completed, an iterable of return values is returned from map() and traversed, reporting the values that were generated.
The loop completes in about one second, depending on the specific random numbers generated.
1 2 3 4 5 6 7 8 9 10 |
0.5884491047270914 0.6946893602558559 0.7169680024270103 0.9433207432796479 0.5374063478473085 0.49870576951245515 0.5229616564751852 0.250982204375086 0.670282828432953 0.44766416300786493 |
Next, let’s look at the same example if we did not have return values.
Parallel For-Loop with map() and No Return Values
We can update the task() function to no longer return a result.
In that case, we do not need to traverse the iterable of return values from the map() function in the caller.
First, we can update the task function to not return the value and instead to print the generated value directly.
1 2 3 4 5 6 7 8 |
# task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # # report the value to show progress print(f'{arg} got {value}', flush=True) |
Note, print messages from child worker processes do not appear immediately by default, but are instead buffered until the child process is terminated.
We can force messages to show immediately, by setting the “flush” argument to True.
You can learn more about printing messages from child processes in the tutorial:
We can then call map() with our task function and arguments and not iterate the results.
1 2 3 4 5 |
... # create the process pool with Pool() as pool: # call the same function with different data in parallel pool.map(task, range(10)) |
Tying this together, a complete example with this change is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# SuperFastPython.com # example of a parallel for loop with no return values from time import sleep from random import random from multiprocessing import Pool # task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # # report the value to show progress print(f'{arg} got {value}', flush=True) # entry point for the program if __name__ == '__main__': # create the process pool with Pool() as pool: # call the same function with different data in parallel pool.map(task, range(10)) |
Running the example creates the multiprocessing pool and issues all ten tasks.
The tasks execute and report their messages.
Once all tasks have completed, the map() function returns and the caller continues on closing the process pool automatically.
1 2 3 4 5 6 7 8 9 10 |
3 got 0.07495290707222657 1 got 0.3551532315681727 5 got 0.5775819151509843 0 got 0.6205673200232464 7 got 0.6050404433157823 8 got 0.5876738799350407 2 got 0.6776107991833116 4 got 0.7459149152853299 6 got 0.8483482745906772 9 got 0.7278120326654482 |
Next, let’s explore how to make a for-loop parallel where the target function has multiple arguments.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Parallel For-Loop with starmap()
A limitation of the map() function is that it only takes one argument.
If our target task function requires multiple arguments, we can use the starmap() function instead.
The iterable passed to the starmap() function must have an iterable for each call to the task function containing the arguments for the function call.
That is, an iterable of iterables, such as a list of tuples, where each tuple is a set of arguments for one function call.
Let’s demonstrate this with a worked example.
We can update the task() function to take three arguments.
1 2 3 4 5 6 7 8 |
# task to execute in another process def task(arg1, arg2, arg3): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value |
We can then prepare an iterable of iterables as arguments for each function call.
In this case, we will create a list of tuples, where each tuple has 3 arguments for a single call to task().
This can be achieved using a list comprehension.
1 2 3 |
... # prepare arguments items = [(i, i*2, i*3) for i in range(10)] |
We can then call the starmap() function to execute the for-loop in parallel and report results.
1 2 3 4 5 |
... # call the same function with different data in parallel for result in pool.starmap(task, items): # report the value to show progress print(result) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# SuperFastPython.com # example of a parallel for loop with multiple arguments from time import sleep from random import random from multiprocessing import Pool # task to execute in another process def task(arg1, arg2, arg3): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value # entry point for the program if __name__ == '__main__': # create the process pool with Pool() as pool: # prepare arguments items = [(i, i*2, i*3) for i in range(10)] # call the same function with different data in parallel for result in pool.starmap(task, items): # report the value to show progress print(result) |
Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.
All ten calls to the task() function are issued to the process pool, each with three arguments.
Once all tasks are completed, an iterable of return values is returned from starmap() and traversed, reporting the values that were generated.
The loop completes in about one second, depending on the specific random numbers generated.
1 2 3 4 5 6 7 8 9 10 |
0.3649786159984052 0.48469386498153644 0.3880300260728501 0.11865254960654414 0.42234122971676635 0.8348296520018135 0.1768696329327919 0.043473834132026656 0.4244687958601574 0.2971373822352351 |
Next, let’s look at how we can use imap() to make our programs more responsive.
Example of Parallel For-Loop with imap()
A limitation of map() and starmap() is that they traverse the provided iterable immediately and issue all function calls to the multiprocessing pool.
This may use a lot of memory if there are thousands or millions of tasks.
Additionally, the iterable of return values is not returned until all tasks have completed. This can be problematic if we would prefer to show progress as tasks are completed.
The imap() function overcomes these limitations by only issuing tasks one at a time as workers become available and yielding return values as tasks complete.
The example below demonstrates a parallel for-loop using the more responsive imap() function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# SuperFastPython.com # example of a parallel for loop that is more responsive from time import sleep from random import random from multiprocessing import Pool # task to execute in another process def task(arg): # generate a value between 0 and 1 value = random() # block for a fraction of a second to simulate work sleep(value) # return the generated value return value # entry point for the program if __name__ == '__main__': # create the process pool with Pool() as pool: # call the same function with different data in parallel for result in pool.imap(task, range(10)): # report the value to show progress print(result) |
Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.
Tasks are issued to the process pool until all workers are occupied. Remaining tasks are only issued once workers become available.
The imap() function returns an iterable immediately and begins yielding results as soon as tasks are completed. Return values are returned in the order that tasks were issued to the process pool.
1 2 3 4 5 6 7 8 9 10 |
0.835257932028165 0.23239313640799752 0.9100656574752977 0.541963403413606 0.771756987041196 0.26518666284928805 0.7566356594123003 0.5955443214316828 0.8208877725107132 0.018073556089019838 |
Note, if the order of return values does not matter, the imap_unordered() function can be used instead, which may be even more responsive.
You can learn more about how to use the imap_unordered() function in the tutorial:
Common Questions
This section lists common questions about making for-loops parallel using the multiprocessing pool.
Do you have any questions?
Share your question in the comments and I may add it to this section.
How To Set The Number of Workers
You can set the number of workers via the “processes” argument to the multiprocessing.Pool class.
For example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.Pool(processes=4) |
Because it is the first argument, you can omit the argument name.
For example:
1 2 3 |
... # create a process pool with 4 workers pool = multiprocessing.Pool(4) |
You can learn more about how to set the number of workers in the tutorial:
How Many Workers Should I Use?
Use the default, which is one worker per logical CPU.
For example:
1 2 3 |
... # create a process pool with one worker per logical CPU pool = multiprocessing.Pool() |
Shouldn’t I Set Number of Workers to Be 1 Less Than The Number of CPUs?
You can if you want, but probably not.
It only makes sense to do this if you have issued tasks to the multiprocessing pool asynchronously (e.g. using map_async()) and then intend to perform computational intensive work in the calling process.
This topic is covered in more detail in the tutorial:
What is a Logical vs Physical CPU?
Modern CPUs typically make use of a technology called hyperthreading.
Hyperthreading does not refer to a program using threads. Instead, it refers to a technology within the CPU cores themselves that allows each physical core or CPU to act as if it were two logical cores or two CPUs.
- Physical Cores: The number of CPU cores provided in the hardware, e.g. the chips.
- Logical Cores: The number of CPU cores after hyperthreading is taken into account.
It provides automatic in-core parallelism that can offer up to a 30% speed-up over CPU cores that do not offer the technology.
As such, when we count CPU cores in a system, we typically count the number of logical CPU cores, not the number of physical CPU cores.
If you know your system uses hyperthreading (it probably does), then you can get the number of physical CPUs in your system by dividing the number of logical CPUs by two.
- Count Physical Cores = Count Logical Cores / 2
How Many CPUs Do I Have?
There are a number of ways to get the number of CPUs in Python.
The two pure Python approaches provided by the standard library are:
- multiprocessing.cpu_count() function
- os.cpu_count() function.
For example:
1 2 3 |
... # get the number of logical cpu cores n_cores = multiprocessing.cpu_count() |
This will report the number of logical CPUs in your system.
You can learn more about getting the number of CPUs in your system in the tutorial:
What If I Need to Access a Shared Resource?
You may need to access a shared resource.
If your task functions are just reading data, then it should be fine.
If your task functions are writing data, then you may need a mutex lock to ensure that only one process can change the shared resource at a time.
This is to avoid a race condition.
You can learn more about how to use a mutex lock with the multiprocessing pool in the tutorial:
What Types of Tasks Can Be Made Parallel?
All types.
The multiprocessing pool is probably well suited to computationally intensive tasks that receive or return only very small amounts of data.
For IO-bound tasks, use thread-based concurrency instead, such as the ThreadPoolExecutor or the ThreadPool.
For tasks that receive or return a lot of data, consider having the processes store the data somewhere for the caller to retrieve later, such as in a file, database, or similar. This is to avoid the computational cost of serializing (pickling) the data for transmission between Python processes.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Multiprocessing Pool Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Pool Class API Cheat Sheet
I would also recommend specific chapters from these books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPool: The Complete Guide
- Python Multiprocessing: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know how to convert a for-loop to be parallel using the multiprocessing pool.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Stillness InMotion on Unsplash
lukas kotatko says
Hi there, you got nice website and seem to be pretty knowledgable in the field of python concurency. Currently, I am trying to solve an issue with speeding up a consumer with consumes data from rabbitMq.
It is written in fastAPI with asyncio, so with concurrency in mind. The cunsummer currently can handle cca 500 – 1000 messages per second (the message is a simple json object). The goal is to improve the performace to be able to deal with 10 times more.
I kind of dont know how to do that – probably with some paralelisation (processes) but dont really know how to glue it all together. Could you give a little bit of hint?
thank you
Jason Brownlee says
Great question.
Perhaps you can run multiple threads, with a separate asyncio event loop in each?
chatbot says
Hi, I gained a good knowledge in this blog.
My question is , I currently try to parallel processing the text extraction from image using own model.
for say I have 14 image in list, what is way to parallel processing ? process or pool
time taken for each pages varies depend on page content
Jason Brownlee says
Good quesiton, perhaps you can issue each task to the pool using map() or apply_async().
Tugiyo says
Hi Jason,I have simplified my python code as follow:
How to parallelize the code so it can be done by, for example, 4 processors simultaneously and automatically giving correct output?
Any suggestions?
Jason Brownlee says
Numpy is already parallel, I don’t recommend mixing in stdlib lib concurrency with numpy concurrency.
See this:
https://scipy.github.io/old-wiki/pages/ParallelProgramming