Profile Python Programs

November 19, 2023 Python Benchmarking

You can profile slow Python programs to discover functions that occupy most of the runtime and functions that are called disproportionately more than any other.

The profilers built into the Python standard library are helpful in quickly discovering why some Python programs are slow and where to focus program optimization efforts.

In this tutorial, you will discover how to profile Python code in order to discover why they are slow.

Let's get started.

What is Code Profiling?

Code profiling is a systematic and analytical technique used in software development to evaluate the runtime behavior of a program.

Its primary objective is to identify performance bottlenecks, memory inefficiencies, and other areas for improvement within the code.

Profiling provides valuable insights into how a program executes, allowing developers to optimize it for better speed, efficiency, and resource utilization.

Profiling tools gather data on various aspects of program execution, including:

  1. CPU Usage: Profilers monitor how much time the CPU spends on each function or method within the code. This information helps pinpoint functions that consume excessive CPU resources.
  2. Memory Usage: They track memory allocations, deallocations, and usage patterns to identify memory leaks and inefficient memory management.
  3. Function Call Times: Profilers record the time taken by each function or method, enabling developers to focus on optimizing the most time-consuming parts of the code.
  4. I/O Operations: Profilers can track input and output operations, helping developers optimize file, network, or database interactions.
  5. Call Graphs: Profiling tools generate call graphs that illustrate the flow of function calls, making it clear which functions are called by others and their associated execution times.

By analyzing the data provided by profilers, developers can make informed decisions about where to focus optimization efforts.

This might involve rewriting inefficient code, eliminating unnecessary function calls, reducing memory usage, or improving algorithmic efficiency.

Ultimately, profiling is an essential step in the software development process, ensuring that applications perform efficiently, use resources judiciously, and deliver a better user experience.

Need to Profile Slow Python Programs

Sometimes our Python Python programs are slow.

We can use profiling to help identify which parts of a Python program are slow so that we can change or fix them to be different or faster.

In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

-- Profiling (computer programming), Wikipedia

Profiling Python code is a valuable practice that offers several benefits for developers.

Below are 5 of the best reasons we should profile our Python code:

  1. Performance Optimization: Profiling helps identify performance bottlenecks and resource-intensive parts of your code. By pinpointing these areas, you can focus your optimization efforts to make your code faster and more efficient.
  2. Resource Management: Profiling provides insights into how your code utilizes system resources such as CPU, memory, and I/O. This information is crucial for efficient resource management, preventing memory leaks, and ensuring optimal resource allocation.
  3. Prioritizing Improvements: Profiling data helps you prioritize which parts of your codebase to optimize. Instead of making blind optimizations, you can concentrate your efforts where they will have the most significant impact.
  4. Reducing Overhead: Profiling reveals unnecessary function calls, redundant operations, or inefficient algorithms. By addressing these issues, you can reduce computational overhead, leading to faster and more responsive applications.
  5. Enhancing User Experience: In web applications or other software with user interfaces, profiling can lead to a better user experience. By optimizing code, you can reduce loading times, prevent lag, and provide a smoother and more responsive interaction for users.

Profiling is an essential tool in the software development toolkit, helping developers build more efficient and high-performing applications while optimizing resource usage.

Next, let's consider profiling as it relates to benchmarking.

Profiling is Not Benchmarking

It is common to confuse profiling with benchmarking.

In fact, profiling code is a very different activity from benchmarking Python code.

They are two distinct techniques used in software development to assess and optimize performance, each with its unique goals and methodologies.

Benchmarking is primarily concerned with measuring the time it takes for a specific piece of code or an entire application to complete a given task. Its primary objective is to quantify the speed and efficiency of code execution.

Benchmarking provides answers to questions like:

Importantly, benchmarking is commonly used to compare different implementations, versions, or configurations of code, assisting in the selection of the most efficient solution for a given task.

Profiling, in contrast, is focused on understanding how code behaves internally. It seeks to identify which parts of the code consume the most resources, such as CPU time, memory, or I/O operations.

Profiling dives into the nitty-gritty details of code execution, answering questions like:

Profiling tools provide data at a fine-grained level, revealing specifics like function call counts, time spent in specific functions, and memory usage.

Profiling is an invaluable diagnostic tool used to uncover bottlenecks and pinpoint areas in the code that require optimization.

We can use profiling after benchmarking to identify areas of code that may be changed that are expected to reduce the overall execution time of the program.

Now that we know about profiling, let's look at how we can use the built-in Python profiler to do so.

How to Use Python Profiler

Python provides an efficient C-based profiler in the cProfile module, built into the Python standard library.

A pure-Python profiler is provided in the profile module that offers the same API as the cProfile module. This can be used on those platforms that do not support the cProfile module.

cProfile and profile provide deterministic profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed.

-- The Python Profilers

We can use the Python profiler via the command line interface.

You may recall that we can run a module as a Program via the python -m flag.

Therefore, we can apply the Python profiler to a program file named program.py as follows:

python -m cProfile program.py

This will run the program program.py normally. Then once the program is done, profile information will be reported on standard output.

The cProfile program takes a number of command line flags.

The -o flag allows us to specify the file in which to store the output of the profiler.

Storing profile results to file is a good practice.

For example:

python -m cProfile -o profile.out program.py

Note that the data is written in a binary format suitable for the pstats module to read and use.

We can provide the -s flag to sort the profile results by one of the table headings.

By default, the profiler results are sorted by cumulative time spent on the function and sub-functions, e.g. cumtime.

cumtime: is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.

-- The Python Profilers

A common helpful way to sort the results by the number of calls made to a function, e.g. ncalls.

ncalls: for the number of calls.

-- The Python Profilers

For example:

python -m cProfile -s ncalls program.py

This can highlight functions that are called many times and in turn, take a long overall time.

We might also sort the results by the total time spent on each function call, e.g. tottime.

tottime: for the total time spent in the given function (and excluding time made in calls to sub-functions)

-- The Python Profilers

For example:

python -m cProfile -s tottime program.py

This can highlight individual function calls that take a long time.

Now that we know how to use the Python profiler, let's explore how we might use it to identify slow parts of Python programs.

Example of Profiling a Program With a Slow Function Call

We can explore using the Python profiler on a program that is slow because it has a long-running function call.

In this example, we will define a function that performs a long blocking call. The function will then be called from the main function.

We will call the time.sleep() function which blocks the entire thread, preventing the program from progressing.

# long running task
def work():
    # block the thread for a long time
    time.sleep(5)

Tying this together, the complete example is listed below.

# SuperFastPython.com
# example of a program that has a slow function call
import time

# long running task
def work():
    # block the thread for a long time
    time.sleep(5)

# main program
def main():
    # report a message
    print('Main is running')
    # run our task
    work()
    # report a message
    print('Main is done.')

# protect the entry point
if __name__ == '__main__':
    # start the program
    main()

Running the program calls the main() function.

The main() function runs and reports a start message and then calls the work() function.

The work() function runs and blocks the main thread with a call to time.sleep() for 5 seconds.

The work() function resumes and terminates.

The main() function resumes, reports a final message, and terminates.

Main is running
Main is done.

Next, we can profile the program, and assume we don't know why the program is slow.

We will use the cProfile module and report with default sorting.

python -m cProfile program.py

If you are unable to run the cProfile program on your system, use the profile module instead. The results will be identical.

The cProfile output reports some details about the program.

For example, there were 8 function calls and the program ran for a little over 5 seconds.

The default output from cProfile does not seem helpful in this case, perhaps sorting by cumulative function execution time (cumtime) is not the best approach.

Main is running
Main is done.
         8 function calls in 5.005 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.005    5.005 {built-in method builtins.exec}
        1    0.000    0.000    5.005    5.005 program.py:1(<module>)
        1    0.000    0.000    5.005    5.005 program.py:11(main)
        1    0.000    0.000    5.005    5.005 program.py:6(work)
        1    5.005    5.005    5.005    5.005 {built-in method time.sleep}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

One approach we can try is to filter all results by the name of our program, e.g. "program.py".

This may give insight into the cumulative execution time of each one of our functions.

A simple way to achieve this is to run the cProfile and filter the results for the program name via grep on the command line.

For example:

python -m cProfile program.py | grep program.py

Note that this assumes you have access to the grep program on your system. It may not be available on Windows by default.

The results show only the output lines from cProfile that mention our program name.

We can see a hierarchy of cumulative time from the top level (or whole program), to the main() function on line 11 to our work() function on line 6.

This highlights that the problem is probably somewhere in the work() function.

1    0.000    0.000    5.001    5.001 program.py:1(<module>)
1    0.000    0.000    5.001    5.001 program.py:11(main)
1    0.000    0.000    5.000    5.000 program.py:6(work)

Next, let's try sorting the output of cProfile by the total execution time of each function, e.g. tottime.

python -m cProfile -s tottime program.py

This helps.

We can see that the first function in the list that has the longest overall execution time is the call to time.sleep().

This highlights how we can use the output from cProfile to zero in on long-running functions in our Python programs.

Main is running
Main is done.
         8 function calls in 5.005 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.005    5.005    5.005    5.005 {built-in method time.sleep}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    5.005    5.005 {built-in method builtins.exec}
        1    0.000    0.000    5.005    5.005 test1.py:11(main)
        1    0.000    0.000    5.005    5.005 test1.py:6(work)
        1    0.000    0.000    5.005    5.005 test1.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Next, let's look at another profiling example, this time with functions that are called many times.

Example of Profiling a Program With a Repeated Function Call

We can explore using the Python profiler on a program that is slow because it repeatedly makes the same function call.

In this case, we can define a function that performs that same mathematical operation many millions of times, e.g. 100 million calls to math.sqrt()

Collectively this is a CPU-bound task and will be slow, but performed in a list comprehension, it does not look like it.

# long running task
def work():
    # block the main thread for a long time
    data = [math.sqrt(i) for i in range(1, 100000000)]

Tying this together, the complete example is listed below.

# SuperFastPython.com
# example of a program that calls a function repeatedly
import math

# long running task
def work():
    # block the main thread for a long time
    data = [math.sqrt(i) for i in range(1, 100000000)]

# main program
def main():
    # report a message
    print('Main is running')
    # run our task
    work()
    # report a message
    print('Main is done.')

# protect the entry point
if __name__ == '__main__':
    # start the program
    main()

Running the program first calls our the main() function.

The main() function runs and reports a start message calls the work() function.

The work() function runs and executes the list comprehension, creating a list of 100 million squared roots of integers.

This takes a while and prevents the main thread from progressing.

Once the list comprehension is complete, the work() function terminates.

The main() function resumes, reports a final message, and terminates.

Main is running
Main is done.

Next, let's profile our program and pretend we don't know the cause of why the program is slow.

We can start off by running the cProfile with default sorting.

python -m cProfile program.py

A truncated sample of the output is provided below.

This does not appear helpful in this case.

It is reporting the internal infrastructure and the first line of our program as having the highest overall cumulative execution time.

Main is running
Main is done.
         100000230 function calls in 30.953 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   30.953   30.953 {built-in method builtins.exec}
        1    0.000    0.000   30.953   30.953 program.py:1(<module>)
        1    1.112    1.112   30.952   30.952 program.py:11(main)
        1    0.000    0.000   29.840   29.840 program.py:6(work)
        1   20.268   20.268   29.840   29.840 program.py:8(<listcomp>)
 99999999    9.572    0.000    9.572    0.000 {built-in method math.sqrt}
        1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:1165(_find_and_load)
        1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:1120(_find_and_load_unlocked)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:666(_load_unlocked)
...

Next, let's try sorting results by the total execution time of each function.

python -m cProfile -s tottime program.py

Here, we can see that line 8 of our program has the highest overall execution time at 20.610 seconds.

This is the line that contains the list comprehension, pointing us in the right direction.

The next function with the largest overall time is math.sqrt at 9.611 seconds. Excellent.

Main is running
Main is done.
         100000230 function calls in 31.315 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   20.610   20.610   30.221   30.221 program.py:8(<listcomp>)
 99999999    9.611    0.000    9.611    0.000 {built-in method math.sqrt}
        1    1.094    1.094   31.315   31.315 program.py:11(main)
        1    0.000    0.000    0.000    0.000 {built-in method _imp.create_dynamic}
        5    0.000    0.000    0.000    0.000 {built-in method posix.stat}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1604(find_spec)
        1    0.000    0.000    0.000    0.000 {built-in method posix.getcwd}
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:128(<listcomp>)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:126(_path_join)
...

Let's try another approach.

In this case, let's sort the results by the total number of times each function is called, e.g. ncalls.

python -m cProfile -s ncalls program.py

Now we see some striking results.

We can see that the math.sqrt function was called nearly 100 million times, 99,999,999 to be precise.

This is far and away beyond the calls to any other function and the source of our problem.

Main is running
Main is done.
         100000230 function calls in 31.477 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 99999999    9.577    0.000    9.577    0.000 {built-in method math.sqrt}
       32    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}
       19    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:244(_verbose_message)
       16    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:126(_path_join)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:128(<listcomp>)
        7    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        6    0.000    0.000    0.000    0.000 {method 'rpartition' of 'str' objects}
        6    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        6    0.000    0.000    0.000    0.000 {built-in method _imp.acquire_lock}
...

This highlights how we can zero in on the cause of a problem by reviewing the same profile data in different ways, both execution time and number of function calls.

Takeaways

You now know how to profile Python programs in order to discover why they are slow.



If you enjoyed this tutorial, you will love my book: Python Benchmarking. It covers everything you need to master the topic with hands-on examples and clear explanations.