Forking Processes is 20x Faster Than Spawning in Python

Forking a process is faster than spawning a process in Python.

This is generally known, but how much faster is forking and when should we consider adopting the fork start method for child processes over spawning?

In this tutorial, you will discover the speed differences between fork and spawn start methods.

Let’s get started.

Table of Contents

Fork vs Spawn Start Methods

Python offers two main methods for starting a new child process.

They are:

Fork start method
Spawn start method.

The fork start method uses a system function call to copy an existing process to create a new process.

This means that the child process has a copy of all memory used by the original process, including all global variables.

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. Available on Unix only. The default on Unix.
— multiprocessing — Process-based parallelism

The spawn start method is an entirely new instance of the Python interpreter started from scratch. It is not a copy of another process.

The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. Available on Unix and Windows. The default on Windows and macOS.
— multiprocessing — Process-based parallelism

A Python process can configure the start method used to create new child processes via the multiprocessing.set_start_method() function.

For more on setting the start method, see the tutorial:

Multiprocessing Start Methods

The fork method is only supported on POSIX-based systems like Linux and macOS (not Windows), whereas the spawn start method is supported on all platforms.

A major difference between the two start methods is speed.

It is generally considered that forking a process is faster than spawning a process.

But how much faster is it?

We can explore the speed differences between the two start methods with controlled experiments.

Run loops using all CPUs, download your FREE book to learn how.

Benchmark Spawn Start Method

We can explore the benchmark speed of starting processes using the spawn start method.

In this example, we will create a new process many times and time how long it takes.

Each process started will execute a target function that does nothing other than return immediately.

For example:

# task to run in a new process

def task():

# do nothing interesting

pass

We can define a process that creates a process to execute our target function, start the process then wait for it to complete in a loop that repeats a specified number of times.

# run a test and time how long it takes

def test(n_repeats):

# repeat many times

for i in range(n_repeats):

# create the process

process = Process(target=task)

# start the process

process.start()

# wait for the process to complete

process.join()

We can then call our target function with a given number of loop iterations, in this case, 1,000 and time how long it takes to complete in seconds.

# entry point

if __name__ == '__main__':

# set the start method

set_start_method('spawn')

# record the start time

time_start = time()

# perform the test

n_repeats = 1000

test(n_repeats)

# record the end time

time_end = time()

# report the total time

duration = time_end - time_start

print(f'Total Time {duration:.3} seconds')

# report estimated time per process

per_process = duration / n_repeats

print(f'About {per_process:.3} seconds per process')

Tying this together, the complete example is listed below.

# SuperFastPython.com

# create many processes using the spawn start method

from time import time

from multiprocessing import Process

from multiprocessing import set_start_method

# task to run in a new process

def task():

# do nothing interesting

pass

# run a test and time how long it takes

def test(n_repeats):

# repeat many times

for i in range(n_repeats):

# create the process

process = Process(target=task)

# start the process

process.start()

# wait for the process to complete

process.join()

# entry point

if __name__ == '__main__':

# set the start method

set_start_method('spawn')

# record the start time

time_start = time()

# perform the test

n_repeats = 1000

test(n_repeats)

# record the end time

time_end = time()

# report the total time

duration = time_end - time_start

print(f'Total Time {duration:.3} seconds')

# report estimated time per process

per_process = duration / n_repeats

print(f'About {per_process:.3} seconds per process')

Running the example first sets the start method to ‘spawn’.

It then records the start time and executes the test() function that spawns a new process 1,000 times.

The end time is recorded and the total time in seconds is reported. Because we know the number of processes created, we can also estimate how long each process takes to create.

In this case, the experiment took about 42.3 seconds to spawn 1,000 processes, which was about 42.3 milliseconds per process.

1 2	Total Time 42.3 seconds About 0.0423 seconds per process

Next, let’s perform the same experiment by forking child processes from the main process.

Download Now: Free Multiprocessing PDF Cheat Sheet

Benchmark Fork Start Method

We can explore the benchmark speed of starting processes using the fork start method.

In this example, we will update the previous example in one small way. We will change the start method from “spawn” to “fork“.

...

# set the start method

set_start_method('fork')

Note, the fork start method is not supported on windows.

The complete example is listed below.

# SuperFastPython.com

# create many processes using the fork start method

from time import time

from multiprocessing import Process

from multiprocessing import set_start_method

# task to run in a new process

def task():

# do nothing interesting

pass

# run a test and time how long it takes

def test(n_repeats):

# repeat many times

for i in range(n_repeats):

# create the process

process = Process(target=task)

# start the process

process.start()

# wait for the process to complete

process.join()

# entry point

if __name__ == '__main__':

# set the start method

set_start_method('fork')

# record the start time

time_start = time()

# perform the test

n_repeats = 1000

test(n_repeats)

# record the end time

time_end = time()

# report the total time

duration = time_end - time_start

print(f'Total Time {duration:.3} seconds')

# report estimated time per process

per_process = duration / n_repeats

print(f'About {per_process:.3} seconds per process')

Running the example first sets the start method to ‘fork’.

It then records the start time and executes the test() function that spawns a new process 1,000 times.

The end time is recorded and the total time in seconds is reported. Because we know the number of processes created, we can also estimate how long each process takes to create.

In this case, the experiment took about 2.07 seconds to fork 1,000 processes from the main process, which was about 2.07 milliseconds per process.

1 2	Total Time 2.07 seconds About 0.00207 seconds per process

Next, let’s compare the speed of the two start methods.

Free Python Multiprocessing Course

Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.

Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.

Learn more

Comparison of Fork vs Spawn Start Methods

Forking a process is faster than spawning a process.

It is not just a little bit faster, but orders of magnitude faster.

Specifically, spawning 1,000 processes took about 42.3 seconds, whereas forking the same number of processes took about 2.07 seconds.

That is a difference of about 40.230 seconds or 20.43x faster.

On the individual process level (on my system), spawning took about 0.0423 seconds per process whereas forking took about 0.00207 seconds per process.

That is a difference of about 0.040 seconds or 40 milliseconds per process started, again a difference of about 20.43x.

This highlights that if you are starting many processes in your application and you are on a POSIX-based operating system, you should strongly consider configuring your program to use the ‘fork’ start method.

Fork is no longer the default start method on macOS because of some edge-case problems. If those problems are not expected, e.g. most applications, consider or at least unit test using the fork start method.

Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess.
— multiprocessing — Process-based parallelism

For more on these edge cases, see:

Python crashes on macOS after fork with no exec

In practice, if many processes are required in your application, you should probably consider re-using processes, such as via a process pool.

Two process pools built-in to the Python standard library include:

multiprocessing.Pool.
concurrent.futures.ProcessPoolExecutor.

To learn more about the differences between these two process pools, see the tutorial:

Multiprocessing Pool vs ProcessPoolExecutor in Python

Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps

Takeaways

You now know the speed differences between fork and spawn start methods.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Photo by Fabio Henning on Unsplash

Comments

Dhananjay Joshi says

June 2, 2023 at 9:09 am

Hi,
As usual great article. However biggest issue in long running multiprocessing calls is fork method normally hangs randomly and indefinitely due to shared variables and lock waits whereas spawn appears clean, compact method, albeit bit slower as compared to fork.

- Jason Brownlee says
  
  June 3, 2023 at 5:17 am
  
  Thanks for sharing.

Forking Processes is 20x Faster Than Spawning in Python

Fork vs Spawn Start Methods

Benchmark Spawn Start Method

Benchmark Fork Start Method

Comparison of Fork vs Spawn Start Methods

Further Reading

Takeaways

Related Tutorials:

Parallel Loops in Python

Multiprocessing Resources:

Loving the Tutorials?

Get The Book:

Don't Dabble!

Learn All Of Python Concurrency

No more idle CPUs

Learn Multiprocessing Systematically

Additional menu

Fork vs Spawn Start Methods

Benchmark Spawn Start Method

Benchmark Fork Start Method

Comparison of Fork vs Spawn Start Methods

Further Reading

Takeaways

Share this:

Related Tutorials:

About Jason Brownlee

Parallel Loops in Python

Reader Interactions

Comments

Do you have any questions?Cancel reply

Footer

Learn Multiprocessing Systematically