Get Multiprocessing Pool Worker PID in Python
You can get the PID of a worker process by calling the os.getpid() function when initializing the worker process or from within the target task function executed by a worker process.
In this tutorial you will discover how to get the PID of worker processes in the Python process pool.
Let's get started.
Need PID of Worker Processes
The multiprocessing.pool.Pool in Python provides a pool of reusable processes for executing ad hoc tasks.
A process pool can be configured when it is created, which will prepare the child workers.
A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation.
-- multiprocessing — Process-based parallelism
We can issue one-off tasks to the process pool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map().
Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().
When using the process pool, we may need the process identifiers (PIDs) of the worker processes.
This may be for many reasons, such as:
- To uniquely identify the worker in the application.
- To include the PID in logging.
- To debug which worker is completing which task.
How can we get the child worker process PIDs in Python?
What is a PID
PID is an acronym for Process ID or Process identifier.
A process identifier is a unique number assigned to a process by the underlying operating system.
Each time a process is started, it is assigned a unique positive integer identifier and the identifier may be different each time the process is started.
The pid uniquely identifies one process among all active processes running on the system, managed by the operating system.
As such, the pid can be used to interact with a process, e.g. to send a signal to the process to interrupt or kill a process.
How to Get Worker PID
When working with a process pool, there are two situations where we may want to get the PID:
- Get the PID for each worker process in the process pool.
- Get the PID for the worker completing a given task.
Let's take a closer look at each approach in turn.
Before we get the PID of worker processes, let's take a brief look at how to get a process PID.
How to Get Process PID
There are two general ways to get the PID for a process, they are:
- multiprocessing.Process.pid attribute.
- os.getpid() function.
We may also get the process instance for the current process via the multiprocessing.current_process() function.
For example:
...
# get the process instance
process = multiprocessing.current_process()
Once we have the process instance, we get the pid via the multiprocessing.Process.pid attribute.
For example:
...
# get the pid
pid = process.pid
Alternatively We can get the pid for the current process via the os.getpid() function.
For example:
...
# get the pid for the current process
pid = os.getpid()
You can learn more about how to get a PID for a process in the tutorial:
How to Get All Worker PID
We can get the PID for all child workers in the process pool.
This can be achieved from the parent process.
First, we can get a list of all active child processes via the multiprocessing.active_children() function. This will return a list of multiprocessing.Process instances.
...
# get all active child processes
children = multiprocessing.active_children()
We can then access the "pid" attribute of each.
...
# get the pid of each child process
for child in children:
print(child.pid)
The downside of this approach is that the list of active children may include child processes that are not worker processes.
Another approach to getting the PID for all worker processes is to configure the process pool to use an initializer function and to get the worker process PID. This function will be called by each child worker process once when it is started in the process pool.
The initialization function does not take any arguments. Within the function we can get and use the worker PID.
For example:
# initialize the worker process
def init_worker():
# get the pid for the current worker process
pid = getpid()
We can then configure the process pool to call the initialization function as each worker is created.
This can be achieved by setting the "initializer" argument in the multiprocessing.pool.Pool constructor to the name of the initialization function.
For example:
...
# create and configure a process pool
pool = multiprocessing.pool.Pool(=initializerinit_worker)
You can learn more about how to initialize worker processes in the process pool in the tutorial:
How to Get Task PID
We can get the worker PID within a given task executed by the process pool.
This can be achieved by calling the getpid() within the target task function executed in the process pool.
For example:
# task executed in a worker process
def task(identifier):
# get the pid for the current worker process
pid = getpid()
Now that we know how to get worker process PIDs, let's look at some worked examples.
Example of Getting All Worker PIDs
We can explore how to get all worker PIDs using a worker initialization function.
In this example we will define a worker initialization function that will get and report the PID of each worker process. We will then create and configure a process pool to use the initialization function, then issue many tasks to the process pool in order that all worker processes are started and their PIDs are reported.
First, we must define a worker process initialization function. The function will not take any arguments and will call the os.getpid() function to get the PID of a worker, then report its value.
The init_worker() function below implements this.
# initialize the worker process
def init_worker():
# get the pid for the current worker process
pid = getpid()
print(f'Worker PID: {pid}', flush=True)
Next, we can define a task that we will execute in the process pool.
The task will take an integer augment then block for a fraction of a second.
The task() function below implements this.
# task executed in a worker process
def task(identifier):
# block for a moment
sleep(0.5)
Next, in the main process, we can create and configure a new process pool.
We will use the context manager interface to ensure the process pool is closed automatically once we are finished with it.
...
# create and configure the process pool
with Pool(initializer=init_worker) as pool:
# ...
You can learn more about the context manager interface in the tutorial:
We will then issue 10 tasks to the process pool, each calling our task() function with an integer between 0 and 9. We will issue the task asynchronously using the map_async() function.
...
# issues tasks to process pool
result = pool.map_async(task, range(10))
Once issued, the main process will block on the AsyncResult returned from the map_async() function until the issued tasks are complete.
...
# wait for tasks to complete
result.wait()
You can learn more about the map_async() function in the tutorial:
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of getting child worker process pids
from os import getpid
from time import sleep
from multiprocessing.pool import Pool
# initialize the worker process
def init_worker():
# get the pid for the current worker process
pid = getpid()
print(f'Worker PID: {pid}', flush=True)
# task executed in a worker process
def task(identifier):
# block for a moment
sleep(0.5)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(initializer=init_worker) as pool:
# issues tasks to process pool
result = pool.map_async(task, range(10))
# wait for tasks to complete
result.wait()
# process pool is closed automatically
Running the example first configures and creates the process pool.
The ten tasks are then issued to the process pool and the main process blocks.
Each worker process is initialized with a call to the init_worker() function. This gets the worker PID and reports the value.
Each task is then executed, blocking for a fraction of a second and then returning.
All tasks complete and the main process continues on automatically closing the process pool and then the application itself.
Note, the specific PIDs reported will be different each time the program is run.
Worker PID: 94212
Worker PID: 94213
Worker PID: 94214
Worker PID: 94216
Worker PID: 94215
Worker PID: 94217
Worker PID: 94218
Worker PID: 94219
Next, let's explore getting the worker PID from within a task.
Example of Getting Task Worker PID
We can get the worker PID within a task executed in the process pool.
In this example we will get the PID of the current process in the custom task function executed by the process pool. We will then issue a single task to the process pool that reports the PID.
Firstly, we can define the custom task function that gets the PID for the current worker process then reports the value.
The task() function below implements this.
# task executed in a worker process
def task(identifier):
# get the pid for the current worker process
pid = getpid()
print(f'Task PID: {pid}', flush=True)
Next, in the main process we can create the process pool with a default configuration using the context manager interface.
...
# create and configure the process pool
with Pool() as pool:
# ...
Next, we can issue a single task asynchronously to the process pool using the apply_async() function on the process pool.
...
# issues tasks to process pool
result = pool.apply_async(task, (0,))
This will return a single AsyncResult object that we can wait on for the issued task to complete.
...
# wait for tasks to complete
result.wait()
Tying this together, the complete example is listed below.
# SuperFastPython.com
# example of getting child worker process pids
from os import getpid
from multiprocessing.pool import Pool
# task executed in a worker process
def task(identifier):
# get the pid for the current worker process
pid = getpid()
print(f'Task PID: {pid}', flush=True)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool() as pool:
# issues tasks to process pool
result = pool.apply_async(task, (0,))
# wait for tasks to complete
result.wait()
# process pool is closed automatically
Running the example first creates the process pool.
A single task is issued to the process pool and the main process blocks until the task is complete.
The task runs, first getting the PID of the current worker process that is running the task, then reporting the value.
The task completes, then the main process continues on automatically closing the process pool then terminating the application.
Note, the reported PID will differ each time the program is run.
Task PID: 94241
Takeaways
You now know how to get the PID of worker processes in the process pool.
If you enjoyed this tutorial, you will love my book: Python Multiprocessing Pool Jump-Start. It covers everything you need to master the topic with hands-on examples and clear explanations.