Watchdog Thread in Python

March 28, 2022 Python Threading

You can create a watchdog thread by creating a daemon thread that polls a target resource.

In this tutorial you will discover how to develop a watchdog thread in Python.

Let's get started.

Need for a Watchdog Thread

A thread is a thread of execution in a computer program.

Every Python program has at least one thread of execution called the main thread. Both processes and threads are created and managed by the underlying operating system.

Sometimes we may need to create additional threads in our program in order to execute code concurrently.

Python provides the ability to create and manage new threads via the threading module and the threading.Thread class.

You can learn more about Python threads in the guide:

In concurrent programming, we often need to monitor a resource and take action if a problem is detected.

This may be for many reasons, such as:

This can be achieved using a watchdog thread.

What is a watchdog thread and how can we develop one in Python?

What is a Watchdog Thread

A watchdog thread is a thread that monitors another resource and takes action if a change or failure is detected.

Generally, a watchdog may monitor any resource, although when we are describing watchdog threads, the resource is typically directly relevant to the parent application.

A watchdog may monitor a resource within the application, such as:

A watchdog may also monitor an external resource, perhaps on which the application is dependent, such as:

A watchdog thread will typically monitor the resource using polling. This means checking the status of the resource repeatedly in a loop after an interval of time.

If a fault or problem with the monitored resource is detected, then the watchdog thread will take action, depending on the nature of the resource, such as:

Next, let's look at how to develop a watchdog thread.

How to Create a Watchdog Thread

A watchdog thread can be created using a new daemon threading.Thread in Python.

This can be achieved by creating a new thread instance and using the "target" argument to specify the watchdog function.

The watchdog can be given an appropriate name, like "Watchdog" and be configured to be a daemon thread by setting the "daemon" argument to True.

For example:

...
# configure a watchdog thread
watchdog = threading.Thread(target=watchdog_task, name="Watchdog", daemon=True)
# start the watchdog thread
watchdog.start()

A daemon thread is a background thread and is appropriate for a watchdog task.

You can learn more about daemon threads in this tutorial:

The function for executing the watchdog task may have any name you like and may take any required data as arguments.

For example, we could define a function named watchdog() and have it take the resource to monitor and the name of the function to call if a failure is detected.

The watchdog would then loop for the duration of the main application and check the status of the resource every interval. This can be achieved using a while loop that first checks the resource, takes action if needed, then blocks for an interval of time, such as a fraction of a second.

For example:

# task for a watchdog thread to monitor a resource
def watchdog_task(resource, action):
	# poll the target resource
	while True:
		# check the resource
		if has_fault(resource):
			# fix the resource
			action()
		# block for a moment
		sleep(0.5)

The checking of the resource may call a custom function or may call a function on the resource object itself.

The action taken may operate on the application, e.g. change a server address, or may operate on the resource object itself. It may even be as simple as logging the fault.

Finally, the target resource that is being monitored should be polled an appropriate amount that makes sense for the application. Perhaps it is every fraction of a section, every minute, hour, day or so on. It may also be some combination of polling and responding to events within the application.

Now that we know how to create and configure a watchdog thread, let's look at a worked example.

Example of a Watchdog Thread

We can develop a worked example of a watchdog thread.

In this example, we will have a worker thread that is designed to run for a long time, but is subject to intermittent failures that cause the thread to stop. We will then create a watchdog thread that will monitor the worker thread for a failure and will restart it as needed.

First, let's define the task for the worker thread.

We will name the function worker_task() and give it a counter variable that it must increment as high as possible in a loop forever. Each iteration of the loop, the thread will block for a second and may fail with a 30% chance.

The function below implements this.

# task for a worker thread
def worker_task():
    # work forever
    counter = 0
    while True:
        counter += 1
        print(f'.worker {counter}')
        # conditionally fail
        if random() < 0.3:
            print('.worker failed')
            break
        # block
        sleep(1)

Next, we can define a function that is responsible for creating a new threading.Thread instance configured to run the worker thread, start the thread and return the thread instance.

This function can be used to initially start the worker thread, and to restart the worker thread if it fails.

The boot_worker() function below implements this.

# create and start a worker thread, returns instance of thread
def boot_worker():
    # create and configure the thread
    worker = Thread(target=worker_task, name="Worker")
    # start the thread
    worker.start()
    # return the instance of the thread
    return worker

Next, we will create a function for running the watchdog.

This function will have a loop that will run for the duration of the program. Each iteration it will check if the worker thread is alive and if not restart it by calling the boot_worker() function.

The watchdog will poll the worker thread every half second.

The watchdog() function below implements this, taking the worker thread instance "target" and function for booting a new worker thread "action" as arguments.

# task for a watchdog thread
def watchdog(target, action):
    # run forever
    print('Watchdog running')
    while True:
        # check the status of the target thread
        if not target.is_alive():
            # report fault
            print('Watchdog: target thread is not running, restarting...')
            # restart the target thread
            target = action()
        # block for a moment
        sleep(0.5)

Finally, the main thread will first book the worker thread.

...
# create the worker thread
worker = boot_worker()

It then creates a new daemon thread and configures it to run the watchdog function. Then starts the watchdog thread.

...
# create the watchdog for the worker thread
watchdog = Thread(target=watchdog, args=(worker, boot_worker), daemon=True, name="Watchdog")
# start the watchdog
watchdog.start()

Finally, the main thread is free to go off and run the main application.

We will simulate this by having the main thread block on the watchdog thread.

...
# do other things...
watchdog.join()

Tying this together, the complete example of a watchdog thread to monitor a worker thread is listed below.

# SuperFastPython.com
# example of a watchdog thread monitoring a worker thread
from time import sleep
from random import random
from threading import Thread

# task for a worker thread
def worker_task():
    # work forever
    counter = 0
    while True:
        counter += 1
        print(f'.worker {counter}')
        # conditionally fail
        if random() < 0.3:
            print('.worker failed')
            break
        # block
        sleep(1)

# create and start a worker thread, returns instance of thread
def boot_worker():
    # create and configure the thread
    worker = Thread(target=worker_task, name="Worker")
    # start the thread
    worker.start()
    # return the instance of the thread
    return worker

# task for a watchdog thread
def watchdog(target, action):
    # run forever
    print('Watchdog running')
    while True:
        # check the status of the target thread
        if not target.is_alive():
            # report fault
            print('Watchdog: target thread is not running, restarting...')
            # restart the target thread
            target = action()
        # block for a moment
        sleep(0.5)

# create the worker thread
worker = boot_worker()
# create the watchdog for the worker thread
watchdog = Thread(target=watchdog, args=(worker, boot_worker), daemon=True, name="Watchdog")
# start the watchdog
watchdog.start()
# do other things...
watchdog.join()

Running the example first starts the worker thread, then the watchdog thread.

The worker threads begin work and may fail any iteration with a probability of 30%.

When the worker does fail, the watchdog notices and reboots the worker thread, restarting the task.

In this case, we can see many cases of the worker thread iterating through the task and failing, then being rebooted by the watchdog.

Your specific results will differ given the use of random numbers.

Note, you will need to kill the program manually once you've had enough, e.g. via Control-C.

Watchdog running
.worker 1
.worker 2
.worker 3
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker 3
.worker 4
.worker 5
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker 3
.worker 4
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker 3
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker failed
Watchdog: target thread is not running, restarting...
.worker 1
.worker 2
.worker 3
.worker 4
.worker 5
.worker 6
...

Takeaways

You now know how to develop a watchdog thread in Python.