Faster File I/O in Python
…once you add concurrency
No fancy third-party libraries.
Just faster file I/O on one machine. Get results like:
- Write files 3x faster (with processes)
- Read files 3x faster (with processes + threads)
- Unzip files 4x faster (with processes + threads)
Your file I/O tasks could be so much faster if you used modern concurrency.
Heard enough? Jump to the book,
Otherwise read on…
Don’t put up with slow File I/O!
File I/O stands for File Input/Output, referring to the process of reading data from and writing data to files on a storage device like a hard drive.
Sequential file I/O tasks are painfully slow!
Is this familiar:
- Loading files one by one
- Writing files one by one
- Unzipping files one by one.
- and so on…
It does not have to be this way.
File I/O can be sped-up dramatically using concurrency!
Convert slow file I/O to be…
Blazing Fast!
Studying how to bring concurrency to file I/O is critical for Python developers.
File I/O operations are inherently slower compared to working with data in RAM, often becoming a significant bottleneck in many programs.
By understanding concurrency and incorporating it into your file I/O tasks, you can unlock the full potential of your modern computer hardware, making your programs more efficient and capable of handling large workloads.
You have many options, such as threads, processes, and coroutines.
For example:
- Load files concurrently with threads
- Write files concurrently with processes
- Unzip files concurrently with coroutines
But which concurrency technique should you choose?
You MUST Benchmark Execution Time
The problem is, there are no silver bullets.
Each program and each task is different and unique.
We cannot know which approach to Python concurrency will give good or even the best file I/O performance for a given task.
Therefore in addition to learning how to perform file I/O operations concurrently, you must learn how to test a suite of different approaches to implementing file I/O operations concurrently.
The process might look like this:
- Step 1: Learn File I/O (if needed)
- Step 2: Learn Concurrency
- Step 3: Learn Benchmarking
- Step 4: Use Benchmarking to figure out which Concurrency method to use.
This can take a lot of time, especially if:
- …you’re not 100% on top of Python file I/O APIs
- …you’re not 100% on top of Python concurrency APIs
- …you’re not familiar with best practices for benchmarking
You could waste a lot of time going in the wrong direction.
Or worse, assume “threads are best for file I/O” and be stuck with sub-optimal performance (yes, processes or processes + threads often give better results).
Skip the Pain and Frustration
Don’t waste weeks and months of your time:
“trying things out!“
Don’t worry, I figured it out for you.
Step 1: File I/O APIs
Firstly, you need to review the Python File I/O APIs
- Get familiar with the open() built-in function
- Get familiar with the os module and functions for making directories, and renaming files.
- Get familiar with the os.path module for working with file paths.
- Get familiar with the shutil module for efficient functions for copying and moving files.
Step 2: Concurrency APIs
Secondly, you need to review the Python Concurrency APIs.
- Get familiar with the threading module and how the GIL is released when doing blocking I/O, like working with files.
- Get familiar with the multiprocessing module and how it offers full parallelism at the cost of computational overhead when sharing data.
- Get familiar with the concurrent.futures module that provides thread pools and process pools.
- Get familiar with coroutines and the asyncio module for asynchronous programming.
Also, an important step:
Did you know that asyncio does not offer non-blocking file I/O?
Therefore:
- Get familiar with the aiofiles library that offers simulated async file I/O in asyncio programs using threads under the covers.
Step 3: Benchmarking Python
Next, get focus on basic Python benchmarking practices.
- Get familiar with how to estimate the execution time of a task or program, such as via the time.time() or time.perf_counter() functions
- Get familiar with the need to repeat benchmarks and record a minimum or average estimated run time.
- Get familiar with how to report performance for choosing between methods, such as difference and speedup factors.
Step 4: Benchmark Concurrency for Your File I/O Task
Finally, you are ready to start testing different concurrency methods for your File I/O task.
Explore different ways to structure your task using concurrency patterns:
- Consider using a single Thread, Process, or coroutine for one-off background tasks
- Consider using comparing the ThreadPoolExecutor and ProcessPoolExecutor for many similar File I/O tasks (e.g. loading many files)
- Consider executing multiple File I/O operations in batches with process pools and thread pools
- Consider nesting a thread pool within each process pool worker when working with thousands of files
- Explore similar patterns with coroutines.
You now know the systematic approach to getting blazingly fast concurrent File I/O in your programs.
Step 5: Repeat
Next, explore this approach with all the file I/O tasks you might need to speed up:
- file writing
- file loading
- file deleting
- file renaming
- file moving
- file copying
- file appending
- file zipping
- file unzipping
- …
Don’t worry, I’ve done all the heavy lifting for you!
Work alongside me with my new book.
Introducing:
“Concurrent File I/O in Python”
Faster File I/O With Threads, Processes, and AsyncIO
“Concurrent File I/O in Python” is my new book that will teach you how to make your file I/O tasks faster using concurrency, from scratch.
This book distills only what you need to know to get started and be effective with concurrent file I/O, super fast.
It’s exactly how I would teach you concurrent file I/O if we were sitting together, pair programming.
Technical Details:
- 15 tutorials taught by example with full working code listings.
- 113 .py code files included that you can use as templates and extend.
- 307 pages for on-screen reading open next to your IDE or editor.
- 2 formats (PDF and EPUB) for screen, tablet, and Kindle reading.
- 1.3 megabyte .zip download that contains the ebook and code.
Everything you need to get started, then get really good at concurrent file I/O, all in one book.
“Concurrent File I/O in Python” will lead you on the path from a Python developer interested in faster file I/O to a developer that can confidently develop concurrent file I/O tasks and programs.
- No fluff.
- Just concepts and code.
- With full working examples.
The book is divided into 15 tutorials.
The idea is you can read and work through one or two tutorials per day, and become capable in less than two weeks.
It is the shortest and most effective path that I know of transforming you into a concurrent file I/O Python developer.
Choose Your Package:
BOOK
You get the book:
- Concurrent File I/O
PDF and EPUB formats
Includes all source code files
BOOKSHELF
BEST VALUE
You get everything (15+ books):
- Threading Jump-Start
- Multiprocessing Jump-Start
- Asyncio Jump-Start
- ThreadPool Jump-Start
- Pool Jump-Start
- ThreadPoolExecutor Jump-Start
- ProcessPoolExecutor Jump-Start
- Threading Interview Questions
- Multiprocessing Interview Questions
- Asyncio Interview Questions
- Executors Interview Questions
- Concurrent File I/O
- Concurrent NumPy
- Python Benchmarking
- Python Asyncio Mastery
Bonus, you also get:
- Concurrent For Loops Guide
- API Mind Maps (4)
- API Cheat Sheets (7)
That is $210 of Value!
(you get a 10.95% discount)
All prices are in USD.
(also get the book from Amazon, Gumroad, and GooglePlay stores)
See What Customers Are Saying:
Ashish Kumar
Loved the book.
Gustavo Zanette Martins
I found it to be highly educational and easy to comprehend.
I made extensive use of the content, and it greatly benefited my understanding.
Thank you for creating such a valuable resource.
You Get 15 Laser-Focused Tutorials
This book is designed to bring you up-to-speed with how to use concurrent file I/O as fast as possible.
As such, it is not exhaustive. There are many topics that are interesting or helpful but are not on the critical path to getting you productive fast.
This book is divided into a course of 15 tutorials in two parts, they are:
Background:
- Tutorial 01: Importance of Concurrency for File I/O.
- Tutorial 02: Tour of Python File I/O.
- Tutorial 03: Tour of Python Concurrency.
- Tutorial 04: Tour of AIOFiles for AsyncIO.
- Tutorial 05: File I/O Concurrency Patterns.
Case Studies:
- Tutorial 06: How to Run File I/O in the Background.
- Tutorial 07: How to Write Files Concurrently.
- Tutorial 08: How to Read Files Concurrently.
- Tutorial 09: How to Delete Files Concurrently.
- Tutorial 10: How to Copy Files Concurrently.
- Tutorial 11: How to Move Files Concurrently.
- Tutorial 12: How to Rename Files Concurrently.
- Tutorial 13: How to Append Files Concurrently.
- Tutorial 14: How to Zip Files Concurrently.
- Tutorial 15: How to Unzip Files Concurrently.
Next, let's look at the structure of each lesson.
Highly structured lessons on how you can get results
The body of this book is 10 case study tutorials.
These case study tutorials all follow the same structure:
- Focused: Each tutorial is focused on one file I/O operation (e.g. read, write, etc.)
- Task: Each tutorial defines a canonical example of the operation to explore with concurrency.
- Baseline: Each tutorial provides a sequential version of the task that is used as a baseline in performance that all concurrent examples must outperform.
- Threading: Each tutorial explores how to extend the baseline to be concurrent using threads.
- Multiprocessing: Each tutorial explores how to extend the baseline to be concurrent using processes.
- Asyncio: Each tutorial explores how to extend the baseline to be concurrent using coroutines.
- Results: Each tutorial provides a summary of the results and suggestions of the concurrency methods that may offer the most promise on the task investigated and similar tasks.
This tutorial structure both teaches you how to systematically explore how to best bring concurrency to a file I/O task or program, but also the specific concurrency techniques that are likely to offer the best performance on similar tasks in the future.
Each tutorial has a specific learning outcome and is designed to be completed in about one hour.
Each tutorial is also designed to be self-contained so that you can read the tutorials out of order if you choose, such as dipping into topics in the future to solve specific programming problems.
The tutorials were written with some intentional repetition of key concepts and code. This is to ensure tutorials remain self-contained. It also provides gentle reminders to help embed the common usage patterns in your mind so that they become second nature.
We Python developers learn best from real and working code examples.
Next, let's look at what you will know after finishing the book.
Your Learning Outcomes
Transform from "Python developer" into "Python developer that can confidently bring concurrent file I/O to your projects with coroutines, threads, and processes"
After working through all of the tutorials in this book, you will know background topics, such as:
- The importance of concurrency for high-performance file I/O.
- How to perform common file I/O APIs in Python.
- How to use Python concurrency APIs including threading, multiprocessing, and asyncio.
- How to perform file I/O with coroutines in asyncio using aiofiles.
- How to use programming patterns for concurrent file I/O.
You will also know how to investigate concurrent file I/O case studies, such as:
- How to perform file I/O tasks in the background.
- How to concurrently write files to disk.
- How to concurrently read files from disk.
- How to concurrently delete files from disk.
- How to concurrently copy files on disk.
- How to concurrently move files on disk.
- How to concurrently rename files on disk.
- How to concurrently append files on disk.
- How to concurrently zip files on disk.
- How to concurrently unzip files on disk.
You will learn from code examples, not pages and pages of fluff.
Get your copy now:
100% Money-Back Guarantee
(no questions asked)
I want you to actually learn concurrent file I/O so well that you can confidently use it on current and future projects.
I designed my book to read just like I'm sitting beside you, showing you how.
I want you to be happy. I want you to win!
I stand behind all of my materials. I know they get results and I'm proud of them.
Nevertheless, if you decide that my books are not a good fit for you, I'll understand.
I offer a 100% money-back guarantee, no questions asked.
To get a refund, contact me with your purchase name and email address.
Frequently Asked Questions
This section covers some frequently asked questions.
If you have any questions. Contact me directly. Any time about anything. I will do my best to help.
What are your prerequisites?
This book is designed for Python developers who want to discover how to develop programs with faster I/O using coroutine, thread, and process-based concurrency.
Specifically, this book is for:
- Developers that can write simple Python programs.
- Developers that need better performance from current or future Python programs.
- Developers that are working with file I/O-based tasks.
This book does not require that you are an expert in the Python programming language or concurrency.
Specifically:
- You do not need to be an expert Python developer.
- You do not need to be an expert in concurrency.
What version of Python do you need?
All code examples use Python 3.
Python 3.10+ to be exact.
Python 2.7 is not supported because it reached its "end of life" in 2020.
Are there code examples?
Yes.
There are 113 .py code files.
Most lessons have many complete, standalone, and fully-working code examples.
The book is provided in a .zip file that includes a src/ directory containing all source code files used in the book.
How long will the book take you to finish?
Work at your own pace.
I recommend about one tutorial per day, over 14 days (2 weeks).
There's no rush and I recommend that you take your time.
The book is designed to be read linearly from start to finish, guiding you from being a Python developer at the start of the book to being a Python developer that can confidently use concurrent file I/O in your projects by the end of the book.
In order to avoid overload, I recommend completing one lesson per day, such as in the evening or during your lunch break. This will allow you to complete the transformation in about one week.
I recommend you maintain a directory with all of the code you type from the lessons in the book. This will allow you to use the directory as your own private code library, allowing you to copy-paste code into your projects in the future.
I recommend trying to adapt and extend the examples in the lessons. Play with them. Break them. This will help you learn more about how the API works and why we follow specific usage patterns.
What format is the book?
You can read the book on your screen, next to your editor.
You can also read the book on your tablet, away from your workstation.
The ebook is provided in 2 formats:
- PDF (.pdf): perfect for reading on the screen or tablet.
- EPUB (.epub): perfect for reading on a tablet with a Kindle or iBooks app.
Many developers like to read the ebook on a tablet or iPad.
How can you get more help?
The lessons in this book were designed to be easy to read and follow.
Nevertheless, sometimes we need a little extra help.
A list of further reading resources is provided at the end of each lesson. These can be helpful if you are interested in learning more about the topic covered, such as fine-grained details of the standard library and API functions used.
The conclusions at the end of the book provide a complete list of websites and books that can help if you want to learn more about Python concurrency and the relevant parts of the Python standard library. It also lists places where you can go online and ask questions about Python concurrency.
Finally, if you ever have questions about the lessons or code in this book, you can contact me any time and I will do my best to help. My contact details are provided at the end of the book.
How many pages is the book?
The PDF is 307 pages (US letter-sized pages).
Can you print the book?
Yes.
Although, I think it's better to work through it on the screen.
- You can search, skip, and jump around really fast.
- You can copy and paste code examples.
- You can compare code output directly.
Is there digital rights management?
No.
The ebooks have no DRM.
Do you get FREE updates?
Yes.
I update all of my books often.
You can email me any time and I will send you the latest version for free.
Can you buy the book elsewhere?
Yes!
You can get a Kindle or paperback version from Amazon.
Many developers prefer to buy from the Kindle store on Amazon directly.
Can you get a paperback version?
Yes!
You can get a paperback version from Amazon.
Can you read a sample?
Yes.
You can read a book sample via the Google Books "preview" feature or via the Amazon "look inside" feature:
Generally, if you like my writing style on SuperFastPython, then you will like the books.
Can you download the source code now?
The source code (.py) files are included in the .zip with the book.
Nevertheless, you can also download all of the code from the dedicated GitHub Project:
Does concurrent file I/O work on your operating system?
Yes.
Python concurrency is built into the Python programming language and works equally well on:
- Windows
- macOS
- Linux
Does concurrent file I/O work on your hardware?
Yes.
Python concurrency is agnostic to the underlying CPU hardware.
If you are running Python on a modern computer, then you will have support for concurrency, e.g. Intel, AMD, ARM, and Apple Silicon CPUs are supported.
About the Author
Hi, I'm Jason Brownlee, Ph.D.
I'm a Python developer, husband, and father to two boys.
I want to share something with you.
I am obsessed with Python concurrency, but I wasn't always this way.
My background is in Artificial Intelligence and I have a few fancy degrees and past job titles to prove it.
You can see my LinkedIn profile here:
- Jason Brownlee LinkedIn Profile
(follow me if you like)
So what?
Well, AI and machine learning have been hot for the last decade. I have spent that time as a Python machine learning developer:
- Working on a range of predictive modeling projects.
- Writing more than 1,000+ tutorials.
- Authoring over 20+ books.
There's one thing about machine learning in Python, your code must be fast.
Really fast.
Modeling code is already generally fast, built on top of C and Fortran code libraries.
But you know how it is on real projects…
You always have to glue bits together, wrap the fast code and run it many times, and so on.
Making code run fast requires Python concurrency and I have spent most of the last decade using all the different types of Python concurrency available.
Including threading, multiprocessing, asyncio, and the suite of popular libraries.
I know my way around Python concurrency and I am deeply frustrated at the bad wrap it has.
This is why I started SuperFastPython.com where you can find hundreds of free tutorials on Python concurrency.
And this is why I wrote this book.
Praise for Super Fast Python
Python developers write to me all the time and let me know how helpful my tutorials and books have been.
Below are some select examples posted to LinkedIn.
What Are You Waiting For?
Stop reading outdated StackOverflow answers.
Learn Python concurrency correctly, step-by-step.
Start today.
Buy now and get your copy in seconds!