You can profile slow Python programs to discover functions that occupy most of the runtime and functions that are called disproportionately more than any other.
The profilers built into the Python standard library are helpful in quickly discovering why some Python programs are slow and where to focus program optimization efforts.
In this tutorial, you will discover how to profile Python code in order to discover why they are slow.
Let’s get started.
What is Code Profiling?
Code profiling is a systematic and analytical technique used in software development to evaluate the runtime behavior of a program.
Its primary objective is to identify performance bottlenecks, memory inefficiencies, and other areas for improvement within the code.
Profiling provides valuable insights into how a program executes, allowing developers to optimize it for better speed, efficiency, and resource utilization.
Profiling tools gather data on various aspects of program execution, including:
- CPU Usage: Profilers monitor how much time the CPU spends on each function or method within the code. This information helps pinpoint functions that consume excessive CPU resources.
- Memory Usage: They track memory allocations, deallocations, and usage patterns to identify memory leaks and inefficient memory management.
- Function Call Times: Profilers record the time taken by each function or method, enabling developers to focus on optimizing the most time-consuming parts of the code.
- I/O Operations: Profilers can track input and output operations, helping developers optimize file, network, or database interactions.
- Call Graphs: Profiling tools generate call graphs that illustrate the flow of function calls, making it clear which functions are called by others and their associated execution times.
By analyzing the data provided by profilers, developers can make informed decisions about where to focus optimization efforts.
This might involve rewriting inefficient code, eliminating unnecessary function calls, reducing memory usage, or improving algorithmic efficiency.
Ultimately, profiling is an essential step in the software development process, ensuring that applications perform efficiently, use resources judiciously, and deliver a better user experience.
Run loops using all CPUs, download your FREE book to learn how.
Need to Profile Slow Python Programs
Sometimes our Python Python programs are slow.
We can use profiling to help identify which parts of a Python program are slow so that we can change or fix them to be different or faster.
In software engineering, profiling (“program profiling”, “software profiling”) is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.
— Profiling (computer programming), Wikipedia
Profiling Python code is a valuable practice that offers several benefits for developers.
Below are 5 of the best reasons we should profile our Python code:
- Performance Optimization: Profiling helps identify performance bottlenecks and resource-intensive parts of your code. By pinpointing these areas, you can focus your optimization efforts to make your code faster and more efficient.
- Resource Management: Profiling provides insights into how your code utilizes system resources such as CPU, memory, and I/O. This information is crucial for efficient resource management, preventing memory leaks, and ensuring optimal resource allocation.
- Prioritizing Improvements: Profiling data helps you prioritize which parts of your codebase to optimize. Instead of making blind optimizations, you can concentrate your efforts where they will have the most significant impact.
- Reducing Overhead: Profiling reveals unnecessary function calls, redundant operations, or inefficient algorithms. By addressing these issues, you can reduce computational overhead, leading to faster and more responsive applications.
- Enhancing User Experience: In web applications or other software with user interfaces, profiling can lead to a better user experience. By optimizing code, you can reduce loading times, prevent lag, and provide a smoother and more responsive interaction for users.
Profiling is an essential tool in the software development toolkit, helping developers build more efficient and high-performing applications while optimizing resource usage.
Next, let’s consider profiling as it relates to benchmarking.
Profiling is Not Benchmarking
It is common to confuse profiling with benchmarking.
In fact, profiling code is a very different activity from benchmarking Python code.
They are two distinct techniques used in software development to assess and optimize performance, each with its unique goals and methodologies.
Benchmarking is primarily concerned with measuring the time it takes for a specific piece of code or an entire application to complete a given task. Its primary objective is to quantify the speed and efficiency of code execution.
Benchmarking provides answers to questions like:
- How long does this function take to run?
- How many operations can this code perform in a second?
- What is the response time of this application under a specific workload?
Importantly, benchmarking is commonly used to compare different implementations, versions, or configurations of code, assisting in the selection of the most efficient solution for a given task.
Profiling, in contrast, is focused on understanding how code behaves internally. It seeks to identify which parts of the code consume the most resources, such as CPU time, memory, or I/O operations.
Profiling dives into the nitty-gritty details of code execution, answering questions like:
- Which part of this program is the slowest?
- Which functions or methods are consuming the most CPU time?
- Where is memory being allocated and deallocated?
Profiling tools provide data at a fine-grained level, revealing specifics like function call counts, time spent in specific functions, and memory usage.
Profiling is an invaluable diagnostic tool used to uncover bottlenecks and pinpoint areas in the code that require optimization.
We can use profiling after benchmarking to identify areas of code that may be changed that are expected to reduce the overall execution time of the program.
Now that we know about profiling, let’s look at how we can use the built-in Python profiler to do so.
Free Python Benchmarking Course
Get FREE access to my 7-day email course on Python Benchmarking.
Discover benchmarking with the time.perf_counter() function, how to develop a benchmarking helper function and context manager and how to use the timeit API and command line.
How to Use Python Profiler
Python provides an efficient C-based profiler in the cProfile module, built into the Python standard library.
A pure-Python profiler is provided in the profile module that offers the same API as the cProfile module. This can be used on those platforms that do not support the cProfile module.
cProfile and profile provide deterministic profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed.
— The Python Profilers
We can use the Python profiler via the command line interface.
You may recall that we can run a module as a Program via the python -m flag.
Therefore, we can apply the Python profiler to a program file named program.py as follows:
1 |
python -m cProfile program.py |
This will run the program program.py normally. Then once the program is done, profile information will be reported on standard output.
The cProfile program takes a number of command line flags.
The -o flag allows us to specify the file in which to store the output of the profiler.
Storing profile results to file is a good practice.
For example:
1 |
python -m cProfile -o profile.out program.py |
Note that the data is written in a binary format suitable for the pstats module to read and use.
We can provide the -s flag to sort the profile results by one of the table headings.
By default, the profiler results are sorted by cumulative time spent on the function and sub-functions, e.g. cumtime.
cumtime: is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
— The Python Profilers
A common helpful way to sort the results by the number of calls made to a function, e.g. ncalls.
ncalls: for the number of calls.
— The Python Profilers
For example:
1 |
python -m cProfile -s ncalls program.py |
This can highlight functions that are called many times and in turn, take a long overall time.
We might also sort the results by the total time spent on each function call, e.g. tottime.
tottime: for the total time spent in the given function (and excluding time made in calls to sub-functions)
— The Python Profilers
For example:
1 |
python -m cProfile -s tottime program.py |
This can highlight individual function calls that take a long time.
Now that we know how to use the Python profiler, let’s explore how we might use it to identify slow parts of Python programs.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Example of Profiling a Program With a Slow Function Call
We can explore using the Python profiler on a program that is slow because it has a long-running function call.
In this example, we will define a function that performs a long blocking call. The function will then be called from the main function.
We will call the time.sleep() function which blocks the entire thread, preventing the program from progressing.
1 2 3 4 |
# long running task def work(): # block the thread for a long time time.sleep(5) |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of a program that has a slow function call import time # long running task def work(): # block the thread for a long time time.sleep(5) # main program def main(): # report a message print('Main is running') # run our task work() # report a message print('Main is done.') # protect the entry point if __name__ == '__main__': # start the program main() |
Running the program calls the main() function.
The main() function runs and reports a start message and then calls the work() function.
The work() function runs and blocks the main thread with a call to time.sleep() for 5 seconds.
The work() function resumes and terminates.
The main() function resumes, reports a final message, and terminates.
1 2 |
Main is running Main is done. |
Next, we can profile the program, and assume we don’t know why the program is slow.
We will use the cProfile module and report with default sorting.
1 |
python -m cProfile program.py |
If you are unable to run the cProfile program on your system, use the profile module instead. The results will be identical.
The cProfile output reports some details about the program.
For example, there were 8 function calls and the program ran for a little over 5 seconds.
The default output from cProfile does not seem helpful in this case, perhaps sorting by cumulative function execution time (cumtime) is not the best approach.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Main is running Main is done. 8 function calls in 5.005 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 5.005 5.005 {built-in method builtins.exec} 1 0.000 0.000 5.005 5.005 program.py:1(<module>) 1 0.000 0.000 5.005 5.005 program.py:11(main) 1 0.000 0.000 5.005 5.005 program.py:6(work) 1 5.005 5.005 5.005 5.005 {built-in method time.sleep} 2 0.000 0.000 0.000 0.000 {built-in method builtins.print} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} |
One approach we can try is to filter all results by the name of our program, e.g. “program.py“.
This may give insight into the cumulative execution time of each one of our functions.
A simple way to achieve this is to run the cProfile and filter the results for the program name via grep on the command line.
For example:
1 |
python -m cProfile program.py | grep program.py |
Note that this assumes you have access to the grep program on your system. It may not be available on Windows by default.
The results show only the output lines from cProfile that mention our program name.
We can see a hierarchy of cumulative time from the top level (or whole program), to the main() function on line 11 to our work() function on line 6.
This highlights that the problem is probably somewhere in the work() function.
1 2 3 |
1 0.000 0.000 5.001 5.001 program.py:1(<module>) 1 0.000 0.000 5.001 5.001 program.py:11(main) 1 0.000 0.000 5.000 5.000 program.py:6(work) |
Next, let’s try sorting the output of cProfile by the total execution time of each function, e.g. tottime.
1 |
python -m cProfile -s tottime program.py |
This helps.
We can see that the first function in the list that has the longest overall execution time is the call to time.sleep().
This highlights how we can use the output from cProfile to zero in on long-running functions in our Python programs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Main is running Main is done. 8 function calls in 5.005 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 5.005 5.005 5.005 5.005 {built-in method time.sleep} 2 0.000 0.000 0.000 0.000 {built-in method builtins.print} 1 0.000 0.000 5.005 5.005 {built-in method builtins.exec} 1 0.000 0.000 5.005 5.005 test1.py:11(main) 1 0.000 0.000 5.005 5.005 test1.py:6(work) 1 0.000 0.000 5.005 5.005 test1.py:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} |
Next, let’s look at another profiling example, this time with functions that are called many times.
Example of Profiling a Program With a Repeated Function Call
We can explore using the Python profiler on a program that is slow because it repeatedly makes the same function call.
In this case, we can define a function that performs that same mathematical operation many millions of times, e.g. 100 million calls to math.sqrt()
Collectively this is a CPU-bound task and will be slow, but performed in a list comprehension, it does not look like it.
1 2 3 4 |
# long running task def work(): # block the main thread for a long time data = [math.sqrt(i) for i in range(1, 100000000)] |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# SuperFastPython.com # example of a program that calls a function repeatedly import math # long running task def work(): # block the main thread for a long time data = [math.sqrt(i) for i in range(1, 100000000)] # main program def main(): # report a message print('Main is running') # run our task work() # report a message print('Main is done.') # protect the entry point if __name__ == '__main__': # start the program main() |
Running the program first calls our the main() function.
The main() function runs and reports a start message calls the work() function.
The work() function runs and executes the list comprehension, creating a list of 100 million squared roots of integers.
This takes a while and prevents the main thread from progressing.
Once the list comprehension is complete, the work() function terminates.
The main() function resumes, reports a final message, and terminates.
1 2 |
Main is running Main is done. |
Next, let’s profile our program and pretend we don’t know the cause of why the program is slow.
We can start off by running the cProfile with default sorting.
1 |
python -m cProfile program.py |
A truncated sample of the output is provided below.
This does not appear helpful in this case.
It is reporting the internal infrastructure and the first line of our program as having the highest overall cumulative execution time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Main is running Main is done. 100000230 function calls in 30.953 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 30.953 30.953 {built-in method builtins.exec} 1 0.000 0.000 30.953 30.953 program.py:1(<module>) 1 1.112 1.112 30.952 30.952 program.py:11(main) 1 0.000 0.000 29.840 29.840 program.py:6(work) 1 20.268 20.268 29.840 29.840 program.py:8(<listcomp>) 99999999 9.572 0.000 9.572 0.000 {built-in method math.sqrt} 1 0.000 0.000 0.001 0.001 <frozen importlib._bootstrap>:1165(_find_and_load) 1 0.000 0.000 0.001 0.001 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked) 1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:666(_load_unlocked) ... |
Next, let’s try sorting results by the total execution time of each function.
1 |
python -m cProfile -s tottime program.py |
Here, we can see that line 8 of our program has the highest overall execution time at 20.610 seconds.
This is the line that contains the list comprehension, pointing us in the right direction.
The next function with the largest overall time is math.sqrt at 9.611 seconds. Excellent.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Main is running Main is done. 100000230 function calls in 31.315 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 20.610 20.610 30.221 30.221 program.py:8(<listcomp>) 99999999 9.611 0.000 9.611 0.000 {built-in method math.sqrt} 1 1.094 1.094 31.315 31.315 program.py:11(main) 1 0.000 0.000 0.000 0.000 {built-in method _imp.create_dynamic} 5 0.000 0.000 0.000 0.000 {built-in method posix.stat} 2 0.000 0.000 0.000 0.000 {built-in method builtins.print} 4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1604(find_spec) 1 0.000 0.000 0.000 0.000 {built-in method posix.getcwd} 16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:128(<listcomp>) 16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:126(_path_join) ... |
Let’s try another approach.
In this case, let’s sort the results by the total number of times each function is called, e.g. ncalls.
1 |
python -m cProfile -s ncalls program.py |
Now we see some striking results.
We can see that the math.sqrt function was called nearly 100 million times, 99,999,999 to be precise.
This is far and away beyond the calls to any other function and the source of our problem.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Main is running Main is done. 100000230 function calls in 31.477 seconds Ordered by: call count ncalls tottime percall cumtime percall filename:lineno(function) 99999999 9.577 0.000 9.577 0.000 {built-in method math.sqrt} 32 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 19 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:244(_verbose_message) 16 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:126(_path_join) 16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:128(<listcomp>) 7 0.000 0.000 0.000 0.000 {built-in method builtins.getattr} 6 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects} 6 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 6 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock} ... |
This highlights how we can zero in on the cause of a problem by reviewing the same profile data in different ways, both execution time and number of function calls.
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python Benchmarking, Jason Brownlee (my book!)
Also, the following Python books have chapters on benchmarking that may be helpful:
- Python Cookbook, 2013. (sections 9.1, 9.10, 9.22, 13.13, and 14.13)
- High Performance Python, 2020. (chapter 2)
Guides
- 4 Ways to Benchmark Python Code
- 5 Ways to Measure Execution Time in Python
- Python Benchmark Comparison Metrics
Benchmarking APIs
- time — Time access and conversions
- timeit — Measure execution time of small code snippets
- The Python Profilers
References
Takeaways
You now know how to profile Python programs in order to discover why they are slow.
Did I make a mistake? See a typo?
I’m a simple humble human. Correct me, please!
Do you have any additional tips?
I’d love to hear about them!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Fausto García-Menéndez on Unsplash
Do you have any questions?