Last Updated on September 12, 2022
You can write to file in a thread-safe manner using a mutex lock via the threading.Lock class.
In this tutorial you will discover how to write thread-safe to a file from many threads.
Let’s get started.
Need to Append File From Multiple Threads
A thread is a thread of execution in a computer program.
Every Python program has at least one thread of execution called the main thread. Both processes and threads are created and managed by the underlying operating system.
Sometimes we may need to create additional threads in our program in order to execute code concurrently.
Python provides the ability to create and manage new threads via the threading module and the threading.Thread class.
You can learn more about Python threads in the guide:
In concurrent programming, we may need to write to a file from multiple threads.
Is writing to a file thread-safe? If not, how can we make appending a file from multiple threads thread-safe?
Run loops using all CPUs, download your FREE book to learn how.
Writing to File is Not Thread-Safe
Writing to a file means opening a file path and writing or appending data to the file.
A file can be opened with the built-in open() function specifying the file path and open mode such as ‘w‘ for writing text or ‘a‘ for appending text. Once open, data can be written via the write() or writelines() functions.
A best practice is to open a file using the context manager interface so that the file is closed automatically once any writing operations are finished.
For example:
1 2 3 4 5 |
... # open file for appending with open('path/to/file.txt', 'a') as file: # write text to data file.write('Test') |
Writing to the same file from multiple threads concurrently is not thread safe and may result in a race condition.
Thread-safe means that writing or appending to the same file from more than one thread may result in a race condition.
The result of a race condition while writing or appending to a file may be that the data written to the file is overwritten or corrupted, resulting in data loss.
Now that we know that writing to a file is not thread-safe, let’s look at how we can make it thread-safe.
How to Thread-Safe Write to File
Writing to a file can be made thread-safe by using a mutual exclusion (mutex) lock.
Any code that opens and writes to the file, or appends to the file can be treated as a critical section of code subject to race conditions.
This code can be protected from race conditions by requiring that the accessing thread first acquire a mutex lock before executing the critical section.
A mutex lock can only be acquired by one thread at a time, and once acquired prevents any other thread acquiring it until the lock is released.
This means that only a single thread will be able to write to the file at a time, making writing to file thread safe.
This can be achieved using the threading.Lock class.
First, a lock can be created and shared among all code that needs to access the same file.
1 2 3 |
... # create a lock lock = threading.Lock() |
Once created, a thread can then acquire the lock before writing to file by calling the acquire() function. Once writing to the file is finished, the lock can be released by calling the release() function.
For example:
1 2 3 4 5 6 7 8 9 |
... # acquire the lock lock.acquire() # open file for appending with open('path/to/file.txt', 'a') as file: # write text to data file.write('Test') # release the lock lock.release() |
Like opening the file, the context manager interface can be used to ensure that the lock is always released after the block enclosed code is exited.
For example:
1 2 3 4 5 6 7 |
... # acquire the lock with lock: # open file for appending with open('path/to/file.txt', 'a') as file: # write text to data file.write('Test') |
If a thread attempts to acquire the lock while it has already been acquired, then it must block or wait until the lock is released. This waiting is performed automatically by the thread when attempting to acquire the lock, no additional code is required.
You can learn more about protecting critical sections with mutex locks in this tutorial:
Next, let’s look at some worked examples.
Free Python Threading Course
Download your FREE threading PDF cheat sheet and get BONUS access to my free 7-day crash course on the threading API.
Discover how to use the Python threading module including how to create and start new threads and how to use a mutex locks and semaphores
Example of Thread-Unsafe Write to File
Before we look at how to make writing to a file thread-safe, let’s look at how it is not thread-safe.
We can develop an example that demonstrates that appending to a file from many threads is not thread safe and results in data loss.
In this example we will create 1,000 threads each of which will generate 1,000 random numbers between 0 and 1 and write them as a line of text to the same file.
There are a few ways to structure this program, for example:
- Each thread opens the file each time a line of text needs to be written.
- Each thread opens the file once and writes each line as generated.
- The application opens the file once and each thread writes using the same file handle.
The specific approach taken really depends on the details of the application.
In this case, we will open the file once in the application and have all threads write to the file using the same file handle. This is really aggressive and it is easy to show the race condition when many threads write to the same file.
First, we can define a task function to be executed by each thread.
The function will take a unique integer to identify the thread and the file handle as arguments. The task will loop for 1,000 iterations and each iteration it will generate a random number using random.random() and then write this number as part of a line of text to the file.
The task() function below implements this.
1 2 3 4 5 6 7 8 |
# task for worker threads def task(number, file): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # write to the file file.write(f'Thread {number} got {value}.\n') |
Next, in the main thread, we can open the file for appending.
Note, we will write to a file named “output.txt” in the current working directory. This is the same directory in which your Python script for the program is located.
1 2 3 4 5 |
... # defile the shared file path filepath = 'output.txt' # open the file file = open(filepath, 'a') |
We can then create 1,000 new threads configured to call the task() function each with a unique integer identifier and the shared file handle.
This can be done in a list comprehension.
1 2 3 |
... # configure many threads threads = [Thread(target=task, args=(i,file)) for i in range(1000)] |
The main thread can then start all threads by calling the start() method and then wait until all threads have terminated by calling the join() method.
1 2 3 4 5 6 7 |
... # start threads for thread in threads: thread.start() # wait for threads to finish for thread in threads: thread.join() |
Finally, the main thread will close the file.
1 2 3 |
... # close the file file.close() |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# SuperFastPython.com # example of writing to a file is not thread-safe from random import random from time import sleep from threading import Thread # task for worker threads def task(number, file): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # write to the file file.write(f'Thread {number} got {value}.\n') # defile the shared file path filepath = 'output.txt' # open the file file = open(filepath, 'a') # configure many threads threads = [Thread(target=task, args=(i,file)) for i in range(1000)] # start threads for thread in threads: thread.start() # wait for threads to finish for thread in threads: thread.join() # close the file file.close() |
Running the example first opens the file for appending.
Next, 1,000 threads are created and started, then the main thread blocks until all tasks are complete.
Each thread loops and writes 1,000 lines to the file.
All tasks complete and the file is closed.
The expectation is that the file contains 1,000,000 lines, with 1,000 lines written by each thread.
We can check this by opening the file in a text editor and inspect the contents.
1 2 3 4 5 6 7 8 9 10 11 |
Thread 0 got 0.11577738406435278. Thread 0 got 0.09757771957904438. Thread 0 got 0.5200579840159526. Thread 0 got 0.5369696807463148. Thread 0 got 0.5260154027798022. Thread 0 got 0.4863295793428698. Thread 0 got 0.36290927042061494. Thread 0 got 0.8170862775710401. Thread 0 got 0.5448456180398039. Thread 0 got 0.48267153945021646. ... |
Type the following command on a POSIX operating system (e.g. Linux or MacOS) to report the total lines in the file.
1 |
cat output.txt | wc -l |
In this case, we can see that the file contains fewer than the expected number of lines.
Data was lost. This highlights that having multiple threads appending to the same file (using the same file handle) is not thread safe.
1 |
993495 |
Delete the file and run the program again.
You will likely see a different number of lines each time the code is run, due to the unpredictable nature of the race condition.
1 |
998346 |
Next, let’s look at how we can make writing to the file thread-safe.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Thread-Safe Write to File
We can write or append a file in a thread-safe manner using a mutex lock.
The example in the previous section can be updated to be thread-safe by requiring that each thread first acquires a lock prior to writing to the file.
This can be achieved by passing the lock to the task function as an argument and then acquiring the lock prior to each call to write() on the shared file handle.
The updated version of the task() function with these changes is listed below.
1 2 3 4 5 6 7 8 9 |
# task for worker threads def task(number, file, lock): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # write to the file with lock: file.write(f'Thread {number} got {value}.\n') |
In the main thread, we can then create the shared lock instance.
1 2 3 |
... # create the shared lock lock = Lock() |
The lock can then be passed to each thread as an argument to the task() function.
1 2 3 |
... # configure many threads threads = [Thread(target=task, args=(i,file,lock)) for i in range(1000)] |
Tying this together, the complete example of thread-safe writing to a file is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# SuperFastPython.com # example of thread-safe writing to a file from random import random from threading import Thread from threading import Lock # task for worker threads def task(number, file, lock): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # write to the file with lock: file.write(f'Thread {number} got {value}.\n') # create the shared lock lock = Lock() # defile the shared file path filepath = 'output.txt' # open the file file = open(filepath, 'a') # configure many threads threads = [Thread(target=task, args=(i,file,lock)) for i in range(1000)] # start threads for thread in threads: thread.start() # wait for threads to finish for thread in threads: thread.join() # close the file file.close() |
Running the example first creates the shared lock and the shared file handle.
The new threads are created and configured to execute the task() function, receiving the shared file handle and lock as arguments.
The main thread starts the threads and then waits for the tasks to complete.
Each thread loops 1,000 times, each iteration generating a random number, acquiring the lock, and writing a line to the file. If the lock is already acquired, then the thread will wait until the lock is available.
This adds some delay to the program, increasing the run time.
Open the output.txt file in a text file and inspect the contents.
1 2 3 4 5 6 7 8 9 10 11 |
Thread 0 got 0.22420494281917025. Thread 0 got 0.21289722108152498. Thread 0 got 0.8053214755874063. Thread 0 got 0.19071164153864895. Thread 0 got 0.5812611973492516. Thread 0 got 0.47701010872803906. Thread 0 got 0.6587265088322402. Thread 0 got 0.2080983771931354. Thread 0 got 0.8302360563792428. Thread 0 got 0.678505225691399. ... |
Review the total number of entries in the file.
There will be 1,000,000 lines in the file. The race condition has been avoided as each thread now writes to the same file in a thread-safe manner.
Importantly, 1,000,000 lines are written to the file every time the program is run, making it consistent.
1 |
1000000 |
Note, the random numbers written to file will differ each time the program is run.
Acquiring the lock is a bottleneck in the code, increasing the overall running time of the program.
An alternate approach would be for each thread to acquire the lock at the beginning of the task, then generate and write all random numbers.
This would acquire the lock be acquired 1,000 times, once for each thread, rather than 1,000,000 times, once for each write to the file.
For example, an updated version of the task() function with this change is listed below.
1 2 3 4 5 6 7 8 9 10 |
# task for worker threads def task(number, file, lock): # acquire the lock with lock: # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # write to file file.write(f'Thread {number} got {value}.\n') |
Next, let’s look at how we might write safely to a file using a dedicated thread.
Example of Safe Dedicated File Writing Thread
Another approach to writing to a file in a thread-safe manner is to use a dedicated file writing thread.
If only a single thread is responsible for writing to the file, then all file writing will be thread-safe.
This can be achieved by creating a new thread that loops for the duration of the program reading lines from other threads and writing them to the file. Worker threads can communicate to the file writer thread using a thread-safe queue data structure. Workers will put messages on the queue and the file writer thread will get messages from the queue and write them to file.
First, we can write a function to be executed by the single file writer thread.
The function will take the path to the file and the queue instance as arguments.
First, the file will be opened using the context manager. Then the task will loop forever. Each iteration it will retrieve a line of text from the queue by calling get(), write it to file, flush the contents of the file buffer (writing text is buffered), then mark the task as done.
Flushing content after each write is computationally expensive, but ensures that if the file is closed, such as by the Python process, that no content will be waiting to be written.
The file_writer() function that implements this is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# dedicated file writing task def file_writer(filepath, queue): # open the file with open(filepath, 'w') as file: # run forever while True: # get a line of text from the queue line = queue.get() # write it to file file.write(line) # flush the buffer file.flush() # mark the unit of work complete queue.task_done() |
Next, we can update the task() function executed by the worker threads.
The function must receive the queue as an argument, and then add lines to the queue each iteration by calling the put() function.
The updated task() function with these changes is listed below.
1 2 3 4 5 6 7 8 |
# task for worker threads def task(number, queue): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # put the result in the queue queue.put(f'Thread {number} got {value}.\n') |
Next, in the main thread, we can create the shared queue.
This will be an instance of the queue.Queue class. This is a thread-safe implementation of the queue data structure that allows multiple threads to call get() and put() without race conditions.
Importantly, the get() function will block until new work arrives on the queue. This is helpful, as it means that our file writer thread will wait patiently and not consume resources while waiting for lines of text to print.
1 2 3 |
... # create the shared queue queue = Queue() |
Next, we can create and start a new thread to execute our file_writer() function.
Importantly, this new thread will be marked as a daemon thread, which is a background task. This means that the Python process can exit while this thread is still running.
1 2 3 4 5 6 |
... # defile the shared file path filepath = 'output.txt' # create and start the file writer thread writer_thread = Thread(target=file_writer, args=(filepath,queue), daemon=True) writer_thread.start() |
We can then create and configure the worker threads, passing the instance of the shared queue.
1 2 3 |
... # configure worker threads threads = [Thread(target=task, args=(i,queue)) for i in range(1000)] |
Finally, after the worker threads have terminated, the main thread can wait on the queue for all submitted tasks to be processed.
This can be achieved by calling the join() function.
1 2 3 |
... # wait for all tasks in the queue to be processed queue.join() |
Each call to put() by worker threads will increment a counter within the queue and each subsequent call to task_done() by the printer thread will decrement the counter, indicating that the unit of worker was processed completely.
After the worker threads have terminated, there are no more lines to add to the queue. Therefore, once the queue’s internal counter reaches zero, we know that all work has been processed.
The join() function will block until all submitted work on the queue has been processed.
Tying this together, the complete example of writing to file in thread-safe manner using a dedicated thread and a queue is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# SuperFastPython.com # example of thread-safe writing to a file with a dedicated writer thread from random import random from threading import Thread from queue import Queue # dedicated file writing task def file_writer(filepath, queue): # open the file with open(filepath, 'w') as file: # run forever while True: # get a line of text from the queue line = queue.get() # write it to file file.write(line) # flush the buffer file.flush() # mark the unit of work complete queue.task_done() # task for worker threads def task(number, queue): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # put the result in the queue queue.put(f'Thread {number} got {value}.\n') # create the shared queue queue = Queue() # defile the shared file path filepath = 'output.txt' # create and start the file writer thread writer_thread = Thread(target=file_writer, args=(filepath,queue), daemon=True) writer_thread.start() # configure worker threads threads = [Thread(target=task, args=(i,queue)) for i in range(1000)] # start threads for thread in threads: thread.start() # wait for threads to finish for thread in threads: thread.join() # wait for all tasks in the queue to be processed queue.join() |
Running the example first creates the shared queue.
Next, the file writer thread is created and configured to execute our file_writer() function as a background daemon thread. It is started and runs forever, reading items from the shared queue and writing them to file.
The worker threads are started and passed an instance of the shared queue. Each thread iterates 1,000 times, generating numbers and writing lines to the shared queue.
The main thread waits for all worker threads to complete. It then blocks on the shared queue until all submitted lines have been written to file.
The main thread terminates, which then terminates the background daemon thread, forcefully closing the open file.
Open the output.txt file in a text editor and review the contents
1 2 3 4 5 6 7 8 9 10 11 |
Thread 0 got 0.14775478937834163. Thread 0 got 0.09069478905986883. Thread 0 got 0.9265023972899946. Thread 0 got 0.8192999500888586. Thread 0 got 0.042930675313449296. Thread 0 got 0.9344168225488523. Thread 0 got 0.6227562426387442. Thread 0 got 0.1159773782005542. Thread 0 got 0.8186760081390562. Thread 0 got 0.2684847766800359. ... |
Inspect the total number of lines in the file.
We can see that it has the expected 1,000,000 lines. Importantly, it will have the same number of lines each time the program is run as there is no race condition in writing to file.
1 |
1000000 |
The way we handle closing the file is a little rough.
It is better to actually close the file as part of normal operation of the program.
This can be achieved by updating the file_writer() function. We can check for a special message, and if received, mark it as processed and return from the function, which will terminate the file writer thread. The value None is an object that we can use as the special message.
For example:
1 2 3 4 5 |
... # check if we are done if line is None: # exit the loop break |
After the loop is exited, the file is closed and we can mark the final message put on the queue as processed.
1 2 3 |
... # mark the exit signal as processed, after the file was closed queue.task_done() |
The updated file_writer() function with these changes is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# dedicated file writing task def file_writer(filepath, queue): # open the file with open(filepath, 'w') as file: # run until the event is set while True: # get a line of text from the queue line = queue.get() # check if we are done if line is None: # exit the loop break # write it to file file.write(line) # flush the buffer file.flush() # mark the unit of work complete queue.task_done() # mark the exit signal as processed, after the file was closed queue.task_done() |
In the main thread, once all worker threads are finished, we can then signal that no more messages are expected by calling put() on the queue and passing the value None.
1 2 3 |
... # signal the file writer thread that we are done queue.put(None) |
This will signal to the file writer thread that no further messages are expected and to terminate.
The main thread can then join the queue and wait for the file writer thread to process all messages, including the signal to terminate.
1 2 3 |
... # wait for all tasks in the queue to be processed queue.join() |
Tying this together, the complete example with these improvements is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# SuperFastPython.com # example of thread-safe writing to a file with a dedicated writer thread from random import random from threading import Thread from queue import Queue # dedicated file writing task def file_writer(filepath, queue): # open the file with open(filepath, 'w') as file: # run until the event is set while True: # get a line of text from the queue line = queue.get() # check if we are done if line is None: # exit the loop break # write it to file file.write(line) # flush the buffer file.flush() # mark the unit of work complete queue.task_done() # mark the exit signal as processed, after the file was closed queue.task_done() # task for worker threads def task(number, queue): # task loop for i in range(1000): # generate random number between 0 and 1 value = random() # put the result in the queue queue.put(f'Thread {number} got {value}.\n') # create the shared queue queue = Queue() # defile the shared file path filepath = 'output.txt' # create and start the file writer thread writer_thread = Thread(target=file_writer, args=(filepath,queue), daemon=True) writer_thread.start() # configure worker threads threads = [Thread(target=task, args=(i,queue)) for i in range(1000)] # start threads for thread in threads: thread.start() # wait for threads to finish for thread in threads: thread.join() # signal the file writer thread that we are done queue.put(None) # wait for all tasks in the queue to be processed queue.join() |
Running the example produces the same thread-safe writing to the file.
Importantly, we are confident that the file is closed correctly as part of the normal execution of the program.
Further Reading
This section provides additional resources that you may find helpful.
Python Threading Books
- Python Threading Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- Threading Module API Cheat Sheet
I also recommend specific chapters in the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python Threading: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python ThreadPool: The Complete Guide
APIs
References
Takeaways
You now know how to write to a file from multiple threads in a thread-safe manner.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Thanuj Mathew on Unsplash
Do you have any questions?