Forking a process is faster than spawning a process in Python.
This is generally known, but how much faster is forking and when should we consider adopting the fork start method for child processes over spawning?
In this tutorial, you will discover the speed differences between fork and spawn start methods.
Let’s get started.
Fork vs Spawn Start Methods
Python offers two main methods for starting a new child process.
They are:
- Fork start method
- Spawn start method.
The fork start method uses a system function call to copy an existing process to create a new process.
This means that the child process has a copy of all memory used by the original process, including all global variables.
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. Available on Unix only. The default on Unix.
— multiprocessing — Process-based parallelism
The spawn start method is an entirely new instance of the Python interpreter started from scratch. It is not a copy of another process.
The parent process starts a fresh Python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. Available on Unix and Windows. The default on Windows and macOS.
— multiprocessing — Process-based parallelism
A Python process can configure the start method used to create new child processes via the multiprocessing.set_start_method() function.
For more on setting the start method, see the tutorial:
The fork method is only supported on POSIX-based systems like Linux and macOS (not Windows), whereas the spawn start method is supported on all platforms.
A major difference between the two start methods is speed.
It is generally considered that forking a process is faster than spawning a process.
But how much faster is it?
We can explore the speed differences between the two start methods with controlled experiments.
Run loops using all CPUs, download your FREE book to learn how.
Benchmark Spawn Start Method
We can explore the benchmark speed of starting processes using the spawn start method.
In this example, we will create a new process many times and time how long it takes.
Each process started will execute a target function that does nothing other than return immediately.
For example:
1 2 3 4 |
# task to run in a new process def task(): # do nothing interesting pass |
We can define a process that creates a process to execute our target function, start the process then wait for it to complete in a loop that repeats a specified number of times.
1 2 3 4 5 6 7 8 9 10 |
# run a test and time how long it takes def test(n_repeats): # repeat many times for i in range(n_repeats): # create the process process = Process(target=task) # start the process process.start() # wait for the process to complete process.join() |
We can then call our target function with a given number of loop iterations, in this case, 1,000 and time how long it takes to complete in seconds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# entry point if __name__ == '__main__': # set the start method set_start_method('spawn') # record the start time time_start = time() # perform the test n_repeats = 1000 test(n_repeats) # record the end time time_end = time() # report the total time duration = time_end - time_start print(f'Total Time {duration:.3} seconds') # report estimated time per process per_process = duration / n_repeats print(f'About {per_process:.3} seconds per process') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# SuperFastPython.com # create many processes using the spawn start method from time import time from multiprocessing import Process from multiprocessing import set_start_method # task to run in a new process def task(): # do nothing interesting pass # run a test and time how long it takes def test(n_repeats): # repeat many times for i in range(n_repeats): # create the process process = Process(target=task) # start the process process.start() # wait for the process to complete process.join() # entry point if __name__ == '__main__': # set the start method set_start_method('spawn') # record the start time time_start = time() # perform the test n_repeats = 1000 test(n_repeats) # record the end time time_end = time() # report the total time duration = time_end - time_start print(f'Total Time {duration:.3} seconds') # report estimated time per process per_process = duration / n_repeats print(f'About {per_process:.3} seconds per process') |
Running the example first sets the start method to ‘spawn’.
It then records the start time and executes the test() function that spawns a new process 1,000 times.
The end time is recorded and the total time in seconds is reported. Because we know the number of processes created, we can also estimate how long each process takes to create.
In this case, the experiment took about 42.3 seconds to spawn 1,000 processes, which was about 42.3 milliseconds per process.
1 2 |
Total Time 42.3 seconds About 0.0423 seconds per process |
Next, let’s perform the same experiment by forking child processes from the main process.
Benchmark Fork Start Method
We can explore the benchmark speed of starting processes using the fork start method.
In this example, we will update the previous example in one small way. We will change the start method from “spawn” to “fork“.
1 2 3 |
... # set the start method set_start_method('fork') |
Note, the fork start method is not supported on windows.
The complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
# SuperFastPython.com # create many processes using the fork start method from time import time from multiprocessing import Process from multiprocessing import set_start_method # task to run in a new process def task(): # do nothing interesting pass # run a test and time how long it takes def test(n_repeats): # repeat many times for i in range(n_repeats): # create the process process = Process(target=task) # start the process process.start() # wait for the process to complete process.join() # entry point if __name__ == '__main__': # set the start method set_start_method('fork') # record the start time time_start = time() # perform the test n_repeats = 1000 test(n_repeats) # record the end time time_end = time() # report the total time duration = time_end - time_start print(f'Total Time {duration:.3} seconds') # report estimated time per process per_process = duration / n_repeats print(f'About {per_process:.3} seconds per process') |
Running the example first sets the start method to ‘fork’.
It then records the start time and executes the test() function that spawns a new process 1,000 times.
The end time is recorded and the total time in seconds is reported. Because we know the number of processes created, we can also estimate how long each process takes to create.
In this case, the experiment took about 2.07 seconds to fork 1,000 processes from the main process, which was about 2.07 milliseconds per process.
1 2 |
Total Time 2.07 seconds About 0.00207 seconds per process |
Next, let’s compare the speed of the two start methods.
Free Python Multiprocessing Course
Download your FREE multiprocessing PDF cheat sheet and get BONUS access to my free 7-day crash course on the multiprocessing API.
Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.
Comparison of Fork vs Spawn Start Methods
Forking a process is faster than spawning a process.
It is not just a little bit faster, but orders of magnitude faster.
Specifically, spawning 1,000 processes took about 42.3 seconds, whereas forking the same number of processes took about 2.07 seconds.
That is a difference of about 40.230 seconds or 20.43x faster.
On the individual process level (on my system), spawning took about 0.0423 seconds per process whereas forking took about 0.00207 seconds per process.
That is a difference of about 0.040 seconds or 40 milliseconds per process started, again a difference of about 20.43x.
This highlights that if you are starting many processes in your application and you are on a POSIX-based operating system, you should strongly consider configuring your program to use the ‘fork’ start method.
Fork is no longer the default start method on macOS because of some edge-case problems. If those problems are not expected, e.g. most applications, consider or at least unit test using the fork start method.
Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess.
— multiprocessing — Process-based parallelism
For more on these edge cases, see:
In practice, if many processes are required in your application, you should probably consider re-using processes, such as via a process pool.
Two process pools built-in to the Python standard library include:
To learn more about the differences between these two process pools, see the tutorial:
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Python Multiprocessing Books
- Python Multiprocessing Jump-Start, Jason Brownlee (my book!)
- Multiprocessing API Interview Questions
- Multiprocessing API Cheat Sheet
I would also recommend specific chapters in the books:
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- High Performance Python, Ian Ozsvald and Micha Gorelick, 2020.
- See: Chapter 9: The multiprocessing Module
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Multiprocessing: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ProcessPoolExecutor: The Complete Guide
APIs
References
Takeaways
You now know the speed differences between fork and spawn start methods.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Fabio Henning on Unsplash
Dhananjay Joshi says
Hi,
As usual great article. However biggest issue in long running multiprocessing calls is fork method normally hangs randomly and indefinitely due to shared variables and lock waits whereas spawn appears clean, compact method, albeit bit slower as compared to fork.
Jason Brownlee says
Thanks for sharing.