Parallel For-Loop With a Multiprocessing Pool

Last Updated on September 12, 2022

You can convert a for-loop to be parallel using the multiprocessing.Pool class.

In this tutorial you will discover how to convert a for-loop to be parallel using the multiprocessing pool.

Let’s get started.

Need to Make For-Loop Parallel

You have a for-loop and you want to execute each iteration in parallel using a separate CPU core.

This is a common situation.

The loop involves performing the same operation multiple times with different data.

For example:

...

# perform the same operation on each item

for item in items:

# perform operation on item...

It most commonly involves calling the same function each iteration with different arguments.

For example:

...

# call the same function each iteration with different data

for item in items:

# call function with one data item

task(item)

Alternately, we might want the results from calling the same function multiple times with different data.

This is often achieved using a list comprehension.

For example:

...

# get results from applying the same function to different data

results = [task(item) for item in items]

Each iteration of the for-loop is performed sequentially. One after the other on one CPU core.

This is a problem because we may have many CPU cores, such as 2, 4, 8 or even 32.

How can we convert a for-loop to be parallel?

Run loops using all CPUs, download your FREE book to learn how.

Prerequisite (prepare your code before we make it parallel)

Before we dive into discovering how to make for-loops parallel, there is a prerequisite.

You will need to refactor your code so that each iteration of your loop calls the same target function each iteration with different arguments.

This means you will need to create a new target function that takes one (or more) arguments with the specific data to operate on each iteration.

For example, your for-loop may look as follows:

...

# slow for loop executed sequentially

for item in items:

# do one thing using item

# do another thing using item

...

You can convert into a loop that calls a target function each iteration that takes the unique data each iteration as an argument.

For example:

# task that operates on an item

def task(item):

# do one thing using item

# do another thing using item

# slow for loop executed sequentially

for item in items:

task(item)

Generally, the function should be self-contained.

It should not use any shared resources to avoid possible concurrency failure modes like race conditions.

It should be a function with no side effects, taking data as arguments and returning any results via return values.

If this is not the case, you must refactor your code so that this is the case, otherwise you may have to contend with mutex locks and such which will make everything a whole lot harder.

Next, let’s look at how we can execute for loops in parallel using all CPU cores.

Download Now: Free Process Pool PDF Cheat Sheet

How To Make For-Loop Parallel With Pool

We can make a for-loop parallel using the multiprocessing pool.

A process pool is a programming pattern for automatically managing a pool of worker processes.

The multiprocessing.Pool class provides a process pool with helpful functions for executing for loops in parallel.

An instance of the Pool class can be created and by default it will use all available CPU cores. Once we are finished with it, we can close it to release the worker processes.

For example:

...

# create a process pool that uses all cpus

pool = multiprocessing.Pool()

# ...

# close the process pool

pool.close()

We can use the context manager interface if we only need to use the pool for one for-loop. This will ensure the pool is closed for us automatically once we are finished using it, even if a task raises an exception.

For example:

...

# create a process pool that uses all cpus

with multiprocessing.Pool() as pool:

# ...

This is the recommended usage of the process pool.

You can learn more about how to use the multiprocessing pool context manager in the tutorial:

Multiprocessing Pool Context Manager

The multiprocessing pool provides a parallel version of the built-in map() function that will call the same function for each item in a provided iterable, e.g. same function with different data.

For example:

...

# create a process pool that uses all cpus

with multiprocessing.Pool() as pool:

# call the function for each item in parallel

pool.map(task, items)

This will traverse the “items” iterable and issue one call to the task() function for each item and return once all tasks have completed.

The map() function returns an iterable of return values from the target function which can be stored or iterated directly, if our target function has a return value.

For example:

...

# create a process pool that uses all cpus

with multiprocessing.Pool() as pool:

# call the function for each item in parallel

for result in pool.map(task, items):

print(result)

You can learn more about how to use the map() function in the tutorial:

Multiprocessing Pool.map() in Python

The map() function on the multiprocessing pool only takes a single argument.

If our target function takes more than one argument, we can use the starmap() function instead.

It takes an iterable of iterable, where each nested iterable provides arguments for one call to the target task function.

For example:

...

# create a process pool that uses all cpus

with multiprocessing.Pool() as pool:

# prepare arguments for reach call to target function

items = [(1,2), (3,4), (5,6)]

# call the function for each item in parallel with multiple arguments

for result in pool.starmap(task, items):

print(result)

You can learn more about how to use the starmap() function in the tutorial:

Multiprocessing Pool.starmap() in Python

A problem with both map() and starmap() is they do not return their iterable of return values until all tasks have completed.

Instead we can use the imap() function. This will only issue tasks to the multiprocessing pool once a worker becomes available, and will yield a return value as tasks are completed.

This is helpful in order to save memory when issuing tasks, if we have thousands or millions of tasks to issue. It is also more responsive as we can process or show results as tasks finish rather than after all tasks have completed.

For example:

...

# create a process pool that uses all cpus

with multiprocessing.Pool() as pool:

# call the function for each item in parallel, get results as tasks complete

for result in pool.imap(task, items):

print(result)

You can learn more about how to use the imap() function in the tutorial:

Multiprocessing Pool.imap() in Python

This was a crash course in using the multiprocessing pool.

Now that we know how to convert our for-loops to be parallel, let’s look at some worked examples that you can use as templates for your own project.

Free Python Multiprocessing Pool Course

Download your FREE Process Pool PDF cheat sheet and get BONUS access to my free 7-day crash course on the Process Pool API.

Discover how to use the Multiprocessing Pool including how to configure the number of workers and how to execute tasks asynchronously.

Learn more

Example of Parallel For-Loop with map()

We can explore how to use map() to execute for-loops in parallel.

In this example we will use a simple task function as a placeholder that you can replace with your own task function to be called each loop iteration.

In this case, our task will take one integer argument to the task. It will then generate a random number between 0 and 1, then sleep for a fraction of a simulate work. Finally, it will either return the generated value as a result for the task.

The task() function below implements this.

# task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

Let’s dive-in.

Serial For-Loop with map()

Before we dive into parallel for-loops with map(), let’s start with a sequential for-loop.

We can use the built-in map() function to call our task function each iteration.

Each function call will be executed sequentially, one after the other on one CPU core.

The complete example is listed below.

# SuperFastPython.com

# example of a sequential for loop

from time import sleep

from random import random

# a task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

# entry point for the program

if __name__ == '__main__':

# call the same function with different data sequentially

for result in map(task, range(10)):

# report the value to show progress

print(result)

Running the example calls the same task() function for each value in the iterable form 0 to 9.

Different random numbers are generated each time the code is run.

Each call to task() happens sequentially, and the entire example takes about 4-5 seconds, depending on the specific numbers generated.

0.12199618835333303

0.5212357710994685

0.7417626978125494

0.0450874955439563

0.4708361362483452

0.46524023376998846

0.32858199011122213

0.468263433349433

0.3856435694592504

0.995716230169814

Next, let’s look at converting the example to execute in parallel using the multiprocessing pool.

Parallel For-Loop with map()

We can update the previous example so that each call to the task() function is executed in parallel using all available CPU cores.

For example, if we have 4 CPU cores, this might be seen as 8 logical CPU cores after hyperthreading. This means our system would be able to execute 8 calls to task() in parallel, then as calls complete the remaining calls to task() will be executed.

First, we can create the multiprocessing pool configured by default to use all available CPU cores.

...

# create the process pool

with Pool() as pool:

# ...

Next, we can call the map() function as before and iterate the result values, although this time the map() function is a method on the multiprocessing pool.

...

# call the same function with different data in parallel

for result in pool.map(task, range(10)):

# report the value to show progress

print(result)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of a parallel for loop

from time import sleep

from random import random

from multiprocessing import Pool

# task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

# entry point for the program

if __name__ == '__main__':

# create the process pool

with Pool() as pool:

# call the same function with different data in parallel

for result in pool.map(task, range(10)):

# report the value to show progress

print(result)

Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.

All ten calls to the task() function are issued to the process pool and it completes them as fast as it is able.

Once all tasks are completed, an iterable of return values is returned from map() and traversed, reporting the values that were generated.

The loop completes in about one second, depending on the specific random numbers generated.

0.5884491047270914

0.6946893602558559

0.7169680024270103

0.9433207432796479

0.5374063478473085

0.49870576951245515

0.5229616564751852

0.250982204375086

0.670282828432953

0.44766416300786493

Next, let’s look at the same example if we did not have return values.

Parallel For-Loop with map() and No Return Values

We can update the task() function to no longer return a result.

In that case, we do not need to traverse the iterable of return values from the map() function in the caller.

First, we can update the task function to not return the value and instead to print the generated value directly.

# task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# # report the value to show progress

print(f'{arg} got {value}', flush=True)

Note, print messages from child worker processes do not appear immediately by default, but are instead buffered until the child process is terminated.

We can force messages to show immediately, by setting the “flush” argument to True.

You can learn more about printing messages from child processes in the tutorial:

How to print() from a Child Process in Python

We can then call map() with our task function and arguments and not iterate the results.

...

# create the process pool

with Pool() as pool:

# call the same function with different data in parallel

pool.map(task, range(10))

Tying this together, a complete example with this change is listed below.

# SuperFastPython.com

# example of a parallel for loop with no return values

from time import sleep

from random import random

from multiprocessing import Pool

# task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# # report the value to show progress

print(f'{arg} got {value}', flush=True)

# entry point for the program

if __name__ == '__main__':

# create the process pool

with Pool() as pool:

# call the same function with different data in parallel

pool.map(task, range(10))

Running the example creates the multiprocessing pool and issues all ten tasks.

The tasks execute and report their messages.

Once all tasks have completed, the map() function returns and the caller continues on closing the process pool automatically.

3 got 0.07495290707222657

1 got 0.3551532315681727

5 got 0.5775819151509843

0 got 0.6205673200232464

7 got 0.6050404433157823

8 got 0.5876738799350407

2 got 0.6776107991833116

4 got 0.7459149152853299

6 got 0.8483482745906772

9 got 0.7278120326654482

Next, let’s explore how to make a for-loop parallel where the target function has multiple arguments.

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Example of Parallel For-Loop with starmap()

A limitation of the map() function is that it only takes one argument.

If our target task function requires multiple arguments, we can use the starmap() function instead.

The iterable passed to the starmap() function must have an iterable for each call to the task function containing the arguments for the function call.

That is, an iterable of iterables, such as a list of tuples, where each tuple is a set of arguments for one function call.

Let’s demonstrate this with a worked example.

We can update the task() function to take three arguments.

# task to execute in another process

def task(arg1, arg2, arg3):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

We can then prepare an iterable of iterables as arguments for each function call.

In this case, we will create a list of tuples, where each tuple has 3 arguments for a single call to task().

This can be achieved using a list comprehension.

...

# prepare arguments

items = [(i, i*2, i*3) for i in range(10)]

We can then call the starmap() function to execute the for-loop in parallel and report results.

...

# call the same function with different data in parallel

for result in pool.starmap(task, items):

# report the value to show progress

print(result)

Tying this together, the complete example is listed below.

# SuperFastPython.com

# example of a parallel for loop with multiple arguments

from time import sleep

from random import random

from multiprocessing import Pool

# task to execute in another process

def task(arg1, arg2, arg3):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

# entry point for the program

if __name__ == '__main__':

# create the process pool

with Pool() as pool:

# prepare arguments

items = [(i, i*2, i*3) for i in range(10)]

# call the same function with different data in parallel

for result in pool.starmap(task, items):

# report the value to show progress

print(result)

Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.

All ten calls to the task() function are issued to the process pool, each with three arguments.

Once all tasks are completed, an iterable of return values is returned from starmap() and traversed, reporting the values that were generated.

The loop completes in about one second, depending on the specific random numbers generated.

0.3649786159984052

0.48469386498153644

0.3880300260728501

0.11865254960654414

0.42234122971676635

0.8348296520018135

0.1768696329327919

0.043473834132026656

0.4244687958601574

0.2971373822352351

Next, let’s look at how we can use imap() to make our programs more responsive.

Example of Parallel For-Loop with imap()

Loving The Tutorials?

Why not take the next step? Get the book.

Learn more

A limitation of map() and starmap() is that they traverse the provided iterable immediately and issue all function calls to the multiprocessing pool.

This may use a lot of memory if there are thousands or millions of tasks.

Additionally, the iterable of return values is not returned until all tasks have completed. This can be problematic if we would prefer to show progress as tasks are completed.

The imap() function overcomes these limitations by only issuing tasks one at a time as workers become available and yielding return values as tasks complete.

The example below demonstrates a parallel for-loop using the more responsive imap() function.

# SuperFastPython.com

# example of a parallel for loop that is more responsive

from time import sleep

from random import random

from multiprocessing import Pool

# task to execute in another process

def task(arg):

# generate a value between 0 and 1

value = random()

# block for a fraction of a second to simulate work

sleep(value)

# return the generated value

return value

# entry point for the program

if __name__ == '__main__':

# create the process pool

with Pool() as pool:

# call the same function with different data in parallel

for result in pool.imap(task, range(10)):

# report the value to show progress

print(result)

Running the example first creates the multiprocessing pool with one worker per logical CPU in the system.

Tasks are issued to the process pool until all workers are occupied. Remaining tasks are only issued once workers become available.

The imap() function returns an iterable immediately and begins yielding results as soon as tasks are completed. Return values are returned in the order that tasks were issued to the process pool.

0.835257932028165

0.23239313640799752

0.9100656574752977

0.541963403413606

0.771756987041196

0.26518666284928805

0.7566356594123003

0.5955443214316828

0.8208877725107132

0.018073556089019838

Note, if the order of return values does not matter, the imap_unordered() function can be used instead, which may be even more responsive.

You can learn more about how to use the imap_unordered() function in the tutorial:

Multiprocessing Pool.imap_unordered() in Python

Common Questions

This section lists common questions about making for-loops parallel using the multiprocessing pool.

Do you have any questions?
Share your question in the comments and I may add it to this section.

How To Set The Number of Workers

You can set the number of workers via the “processes” argument to the multiprocessing.Pool class.

For example:

...

# create a process pool with 4 workers

pool = multiprocessing.Pool(processes=4)

Because it is the first argument, you can omit the argument name.

For example:

...

# create a process pool with 4 workers

pool = multiprocessing.Pool(4)

You can learn more about how to set the number of workers in the tutorial:

Multiprocessing Pool Number of Workers in Python

How Many Workers Should I Use?

Use the default, which is one worker per logical CPU.

For example:

...

# create a process pool with one worker per logical CPU

pool = multiprocessing.Pool()

Shouldn’t I Set Number of Workers to Be 1 Less Than The Number of CPUs?

You can if you want, but probably not.

It only makes sense to do this if you have issued tasks to the multiprocessing pool asynchronously (e.g. using map_async()) and then intend to perform computational intensive work in the calling process.

This topic is covered in more detail in the tutorial:

Multiprocessing Pool Number of Workers in Python

What is a Logical vs Physical CPU?

Modern CPUs typically make use of a technology called hyperthreading.

Hyperthreading does not refer to a program using threads. Instead, it refers to a technology within the CPU cores themselves that allows each physical core or CPU to act as if it were two logical cores or two CPUs.

Physical Cores: The number of CPU cores provided in the hardware, e.g. the chips.
Logical Cores: The number of CPU cores after hyperthreading is taken into account.

It provides automatic in-core parallelism that can offer up to a 30% speed-up over CPU cores that do not offer the technology.

As such, when we count CPU cores in a system, we typically count the number of logical CPU cores, not the number of physical CPU cores.

If you know your system uses hyperthreading (it probably does), then you can get the number of physical CPUs in your system by dividing the number of logical CPUs by two.

Count Physical Cores = Count Logical Cores / 2

How Many CPUs Do I Have?

There are a number of ways to get the number of CPUs in Python.

The two pure Python approaches provided by the standard library are:

multiprocessing.cpu_count() function
os.cpu_count() function.

For example:

...

# get the number of logical cpu cores

n_cores = multiprocessing.cpu_count()

This will report the number of logical CPUs in your system.

You can learn more about getting the number of CPUs in your system in the tutorial:

Number of CPUs in Python

What If I Need to Access a Shared Resource?

You may need to access a shared resource.

If your task functions are just reading data, then it should be fine.

If your task functions are writing data, then you may need a mutex lock to ensure that only one process can change the shared resource at a time.

This is to avoid a race condition.

You can learn more about how to use a mutex lock with the multiprocessing pool in the tutorial:

Use a Lock in the Multiprocessing Pool

What Types of Tasks Can Be Made Parallel?

All types.

The multiprocessing pool is probably well suited to computationally intensive tasks that receive or return only very small amounts of data.

For IO-bound tasks, use thread-based concurrency instead, such as the ThreadPoolExecutor or the ThreadPool.

For tasks that receive or return a lot of data, consider having the processes store the data somewhere for the caller to retrieve later, such as in a file, database, or similar. This is to avoid the computational cost of serializing (pickling) the data for transmission between Python processes.

Takeaways

You now know how to convert a for-loop to be parallel using the multiprocessing pool.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Stillness InMotion on Unsplash

Comments

lukas kotatko says

September 14, 2022 at 7:12 am

Hi there, you got nice website and seem to be pretty knowledgable in the field of python concurency. Currently, I am trying to solve an issue with speeding up a consumer with consumes data from rabbitMq.
It is written in fastAPI with asyncio, so with concurrency in mind. The cunsummer currently can handle cca 500 – 1000 messages per second (the message is a simple json object). The goal is to improve the performace to be able to deal with 10 times more.

I kind of dont know how to do that – probably with some paralelisation (processes) but dont really know how to glue it all together. Could you give a little bit of hint?

thank you

- Jason Brownlee says
  
  September 15, 2022 at 5:25 am
  
  Great question.
  
  Perhaps you can run multiple threads, with a separate asyncio event loop in each?
  
chatbot says

September 16, 2022 at 5:11 pm

Hi, I gained a good knowledge in this blog.
My question is , I currently try to parallel processing the text extraction from image using own model.
for say I have 14 image in list, what is way to parallel processing ? process or pool
time taken for each pages varies depend on page content

- Jason Brownlee says
  
  September 17, 2022 at 4:46 am
  
  Good quesiton, perhaps you can issue each task to the pool using map() or apply_async().

Tugiyo says

January 18, 2023 at 2:32 pm

Hi Jason,I have simplified my python code as follow:

import numpy.ma as ma

a=ma.zeros((10, 5, 5))

b=ma.zeros((10, 5, 5))

c=ma.zeros((10, 5, 5))

for i in np.arange(5):

for j in np.arange(5):

r =a[:,i,j]

p =b[:,i,j]

c[:,i,j] = r*p

How to parallelize the code so it can be done by, for example, 4 processors simultaneously and automatically giving correct output?
Any suggestions?

Jason Brownlee says

January 19, 2023 at 6:48 am

Numpy is already parallel, I don’t recommend mixing in stdlib lib concurrency with numpy concurrency.

See this:
https://scipy.github.io/old-wiki/pages/ParallelProgramming

Parallel For-Loop With a Multiprocessing Pool

Need to Make For-Loop Parallel

Prerequisite (prepare your code before we make it parallel)

How To Make For-Loop Parallel With Pool

Example of Parallel For-Loop with map()

Serial For-Loop with map()

Parallel For-Loop with map()

Parallel For-Loop with map() and No Return Values

Example of Parallel For-Loop with starmap()

Example of Parallel For-Loop with imap()

Common Questions

How To Set The Number of Workers

How Many Workers Should I Use?

Shouldn’t I Set Number of Workers to Be 1 Less Than The Number of CPUs?

What is a Logical vs Physical CPU?

How Many CPUs Do I Have?

What If I Need to Access a Shared Resource?

What Types of Tasks Can Be Made Parallel?

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn the Pool Class Systematically

Additional menu

Need to Make For-Loop Parallel

Prerequisite (prepare your code before we make it parallel)

How To Make For-Loop Parallel With Pool

Example of Parallel For-Loop with map()

Serial For-Loop with map()

Parallel For-Loop with map()

Parallel For-Loop with map() and No Return Values

Example of Parallel For-Loop with starmap()

Example of Parallel For-Loop with imap()

Common Questions

How To Set The Number of Workers

How Many Workers Should I Use?

Shouldn’t I Set Number of Workers to Be 1 Less Than The Number of CPUs?

What is a Logical vs Physical CPU?

How Many CPUs Do I Have?

What If I Need to Access a Shared Resource?

What Types of Tasks Can Be Made Parallel?

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Comments

Do you have any questions?Cancel reply

Footer

Learn the Pool Class Systematically