Last Updated on September 12, 2022
You can create a watchdog thread by creating a daemon thread that polls a target resource.
In this tutorial you will discover how to develop a watchdog thread in Python.
Let’s get started.
Need for a Watchdog Thread
A thread is a thread of execution in a computer program.
Every Python program has at least one thread of execution called the main thread. Both processes and threads are created and managed by the underlying operating system.
Sometimes we may need to create additional threads in our program in order to execute code concurrently.
Python provides the ability to create and manage new threads via the threading module and the threading.Thread class.
You can learn more about Python threads in the guide:
In concurrent programming, we often need to monitor a resource and take action if a problem is detected.
This may be for many reasons, such as:
- Access to the resource is essential to the application.
- The resource may be known to have rare but repeating intermittent problems.
- The resource may be restarted or changed if there is a fault, allowing the application to continue working.
This can be achieved using a watchdog thread.
What is a watchdog thread and how can we develop one in Python?
Run loops using all CPUs, download your FREE book to learn how.
What is a Watchdog Thread
A watchdog thread is a thread that monitors another resource and takes action if a change or failure is detected.
Generally, a watchdog may monitor any resource, although when we are describing watchdog threads, the resource is typically directly relevant to the parent application.
A watchdog may monitor a resource within the application, such as:
- Another thread, such as a worker thread.
- Data, such as data structures in memory.
- Program state, such as global or instance variables.
A watchdog may also monitor an external resource, perhaps on which the application is dependent, such as:
- A server, e.g. webserver, fileserver, etc.
- A file or directory, e.g. for change.
- A database, e.g. for accessibility.
A watchdog thread will typically monitor the resource using polling. This means checking the status of the resource repeatedly in a loop after an interval of time.
If a fault or problem with the monitored resource is detected, then the watchdog thread will take action, depending on the nature of the resource, such as:
- Reporting the fault, e.g. logging.
- Restarting or resuming a task or service.
- Changing the server address, e.g. fail-over.
Next, let’s look at how to develop a watchdog thread.
How to Create a Watchdog Thread
A watchdog thread can be created using a new daemon threading.Thread in Python.
This can be achieved by creating a new thread instance and using the “target” argument to specify the watchdog function.
The watchdog can be given an appropriate name, like “Watchdog” and be configured to be a daemon thread by setting the “daemon” argument to True.
For example:
1 2 3 4 5 |
... # configure a watchdog thread watchdog = threading.Thread(target=watchdog_task, name="Watchdog", daemon=True) # start the watchdog thread watchdog.start() |
A daemon thread is a background thread and is appropriate for a watchdog task.
You can learn more about daemon threads in this tutorial:
The function for executing the watchdog task may have any name you like and may take any required data as arguments.
For example, we could define a function named watchdog() and have it take the resource to monitor and the name of the function to call if a failure is detected.
The watchdog would then loop for the duration of the main application and check the status of the resource every interval. This can be achieved using a while loop that first checks the resource, takes action if needed, then blocks for an interval of time, such as a fraction of a second.
For example:
1 2 3 4 5 6 7 8 9 10 |
# task for a watchdog thread to monitor a resource def watchdog_task(resource, action): # poll the target resource while True: # check the resource if has_fault(resource): # fix the resource action() # block for a moment sleep(0.5) |
The checking of the resource may call a custom function or may call a function on the resource object itself.
The action taken may operate on the application, e.g. change a server address, or may operate on the resource object itself. It may even be as simple as logging the fault.
Finally, the target resource that is being monitored should be polled an appropriate amount that makes sense for the application. Perhaps it is every fraction of a section, every minute, hour, day or so on. It may also be some combination of polling and responding to events within the application.
Now that we know how to create and configure a watchdog thread, let’s look at a worked example.
Free Python Threading Course
Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.
Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores
Example of a Watchdog Thread
We can develop a worked example of a watchdog thread.
In this example, we will have a worker thread that is designed to run for a long time, but is subject to intermittent failures that cause the thread to stop. We will then create a watchdog thread that will monitor the worker thread for a failure and will restart it as needed.
First, let’s define the task for the worker thread.
We will name the function worker_task() and give it a counter variable that it must increment as high as possible in a loop forever. Each iteration of the loop, the thread will block for a second and may fail with a 30% chance.
The function below implements this.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# task for a worker thread def worker_task(): # work forever counter = 0 while True: counter += 1 print(f'.worker {counter}') # conditionally fail if random() < 0.3: print('.worker failed') break # block sleep(1) |
Next, we can define a function that is responsible for creating a new threading.Thread instance configured to run the worker thread, start the thread and return the thread instance.
This function can be used to initially start the worker thread, and to restart the worker thread if it fails.
The boot_worker() function below implements this.
1 2 3 4 5 6 7 8 |
# create and start a worker thread, returns instance of thread def boot_worker(): # create and configure the thread worker = Thread(target=worker_task, name="Worker") # start the thread worker.start() # return the instance of the thread return worker |
Next, we will create a function for running the watchdog.
This function will have a loop that will run for the duration of the program. Each iteration it will check if the worker thread is alive and if not restart it by calling the boot_worker() function.
The watchdog will poll the worker thread every half second.
The watchdog() function below implements this, taking the worker thread instance “target” and function for booting a new worker thread “action” as arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# task for a watchdog thread def watchdog(target, action): # run forever print('Watchdog running') while True: # check the status of the target thread if not target.is_alive(): # report fault print('Watchdog: target thread is not running, restarting...') # restart the target thread target = action() # block for a moment sleep(0.5) |
Finally, the main thread will first book the worker thread.
1 2 3 |
... # create the worker thread worker = boot_worker() |
It then creates a new daemon thread and configures it to run the watchdog function. Then starts the watchdog thread.
1 2 3 4 5 |
... # create the watchdog for the worker thread watchdog = Thread(target=watchdog, args=(worker, boot_worker), daemon=True, name="Watchdog") # start the watchdog watchdog.start() |
Finally, the main thread is free to go off and run the main application.
We will simulate this by having the main thread block on the watchdog thread.
1 2 3 |
... # do other things... watchdog.join() |
Tying this together, the complete example of a watchdog thread to monitor a worker thread is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# SuperFastPython.com # example of a watchdog thread monitoring a worker thread from time import sleep from random import random from threading import Thread # task for a worker thread def worker_task(): # work forever counter = 0 while True: counter += 1 print(f'.worker {counter}') # conditionally fail if random() < 0.3: print('.worker failed') break # block sleep(1) # create and start a worker thread, returns instance of thread def boot_worker(): # create and configure the thread worker = Thread(target=worker_task, name="Worker") # start the thread worker.start() # return the instance of the thread return worker # task for a watchdog thread def watchdog(target, action): # run forever print('Watchdog running') while True: # check the status of the target thread if not target.is_alive(): # report fault print('Watchdog: target thread is not running, restarting...') # restart the target thread target = action() # block for a moment sleep(0.5) # create the worker thread worker = boot_worker() # create the watchdog for the worker thread watchdog = Thread(target=watchdog, args=(worker, boot_worker), daemon=True, name="Watchdog") # start the watchdog watchdog.start() # do other things... watchdog.join() |
Running the example first starts the worker thread, then the watchdog thread.
The worker threads begin work and may fail any iteration with a probability of 30%.
When the worker does fail, the watchdog notices and reboots the worker thread, restarting the task.
In this case, we can see many cases of the worker thread iterating through the task and failing, then being rebooted by the watchdog.
Your specific results will differ given the use of random numbers.
Note, you will need to kill the program manually once you’ve had enough, e.g. via Control-C.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
Watchdog running .worker 1 .worker 2 .worker 3 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker 3 .worker 4 .worker 5 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker 3 .worker 4 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker 3 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker failed Watchdog: target thread is not running, restarting... .worker 1 .worker 2 .worker 3 .worker 4 .worker 5 .worker 6 ... |
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Python Threading Books
- Python Threading Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- Threading Module API Cheat Sheet
I also recommend specific chapters in the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Threading: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to develop a watchdog thread in Python.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by nick mercer on Unsplash
Do you have any questions?