Last Updated on October 29, 2022
You can configure the number of worker threads in the ThreadPool class via the “processes” argument.
In this tutorial you will discover how to configure the number of worker threads for the ThreadPool in Python.
Let’s get started.
Need to Configure the Number of Worker Threads
The multiprocessing.pool.ThreadPool in Python provides a pool of reusable threads for executing ad hoc tasks.
A thread pool object which controls a pool of worker threads to which jobs can be submitted.
— multiprocessing — Process-based parallelism
The ThreadPool class extends the Pool class. The Pool class provides a pool of worker processes for process-based concurrency.
Although the ThreadPool class is in the multiprocessing module it offers thread-based concurrency and is best suited to IO-bound tasks, such as reading or writing from sockets or files.
A ThreadPool can be configured when it is created, which will prepare the new threads.
We can issue one-off tasks to the ThreadPool using functions such as apply() or we can apply the same function to an iterable of items using functions such as map().
Results for issued tasks can then be retrieved synchronously, or we can retrieve the result of tasks later by using asynchronous versions of the functions such as apply_async() and map_async().
The ThreadPool has a fixed number of worker threads.
It is important to configure the number of worker threads in the pool to an appropriate number for the tasks that need to be executed.
How do you configure the number of worker threads in the ThreadPool?
Run loops using all CPUs, download your FREE book to learn how.
How to Configure The Number of Worker Threads
We can configure the number of worker threads in the ThreadPool class by setting the “processes” argument in the constructor.
By default this equals the number of logical CPUs in your system.
processes is the number of worker threads to use. If processes is None then the number returned by os.cpu_count() is used.
— multiprocessing — Process-based parallelism
For example, if we had 4 physical CPU cores with hyperthreading, this would mean we would have 8 logical CPU cores and this would be the default number of workers in the thread pool.
We can set the “processes” argument to specify the number of new threads to create and use as workers in the ThreadPool.
For example:
1 2 3 |
... # create a thread pool with 4 workers pool = multiprocessing.pool.ThreadPool(processes=4) |
The “processes” argument is the first argument in the constructor and does not need to be specified by name to be set, for example:
1 2 3 |
... # create a thread pool with 4 workers pool = multiprocessing.pool.ThreadPool(4) |
If we are using the context manager to create the thread pool so that it is automatically shutdown, then you can configure the number of threads in the same manner.
For example:
1 2 3 4 |
... # create a thread pool with 4 workers with multiprocessing.pool.ThreadPool(4): # ... |
It is common to have more threads than CPUs (physical or logical) in your system.
The reason for this is that threads are used for IO-bound tasks, not CPU-bound tasks. This means that threads are used for tasks that wait for relatively slow resources to respond, like hard drives, DVD drives, printers, network connections, and more.
Therefore, it is not uncommon to have 10s, 100s and even 1,000s of threads in your ThreadPool, depending on your specific needs. It is unusual to have more than a few thousand threads. If you require this many concurrent network connections then alternative solutions may be preferred, such as AsyncIO.
Now that we know how to configure the number of worker threads in the ThreadPool, let’s look at some worked examples.
Set the Number of Worker Threads
Let’s explore how to configure the number of worker threads with some worked examples.
Check the Default Number of Workers
First, let’s check how many threads are created by default for the ThreadPool class on your system.
One approach is to report the status of the ThreadPool directly by printing the object.
This will create a string representation of the ThreadPool that includes its current status and the number of worker threads that are running.
For example:
1 2 3 |
... # report the status of the thread pool print(pool) |
Recall that the ThreadPool class extends the Pool class. Looking at the source code for the Pool class, we can see that the number of worker threads chosen by default is stored in the _processes attribute, which we can access and report after a thread pool is created.
For example:
1 2 3 |
... # report the number of threads in the pool print(pool._processes) |
Note, _processes is a protected member and may change in the future.
A final approach we can use is to get and report the total number of active threads.
This can be achieved using the threading.active_count() function and is only effective if the ThreadPool is the only source of threads for the current process.
For example:
1 2 3 4 |
... # report the number of active threads active_thread_count = active_count() print(active_thread_count) |
Tying this together, the example below creates a thread pool with the default number of worker threads and reports the number of worker threads that were created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# SuperFastPython.com # example of retrieving the default number of workers in the thread pool from multiprocessing.pool import ThreadPool from threading import active_count # protect the entry point if __name__ == '__main__': # create a thread pool with the default number of workers pool = ThreadPool() # report the status of the thread pool print(pool) # report the number of threads in the pool print(pool._processes) # report the number of active threads active_thread_count = active_count() print(active_thread_count) |
Running the example first creates the ThreadPool, configured with the default number of worker threads.
The status of the thread pool is then reported, showing that it is running and configured with a pool size.
The thread pool attribute for the number of workers is then reported, then the number of active threads is reported.
We can see that both the string representation of the ThreadPool object and the internal member of the object agree regarding the number of worker threads, 8 on my system.
We can see that there are 12 active threads running. This suggests that 4 additional threads are created by the ThreadPool object as internal helpers, likely for handling queues of tasks and results.
Note, results will differ depending on the number of CPU cores in your system.
1 2 3 |
<multiprocessing.pool.ThreadPool state=RUN pool_size=8> 8 12 |
How many worker threads are allocated by default on your system?
Let me know in the comments below.
Example of Configuring The Number of Workers
We can specify the number of worker threads directly and this is a good idea in most applications.
The example below demonstrates how to configure many worker threads using the context manager interface for the ThreadPool class.
1 2 3 4 5 6 7 8 9 10 |
# SuperFastPython.com # example of configuring the number of worker threads from multiprocessing.pool import ThreadPool # protect the entry point if __name__ == '__main__': # create a thread pool with many workers with ThreadPool(100) as pool: # report the status of the pool print(pool) |
Running the example configures the thread pool to use 100 worker threads and confirms that created that many workers
1 |
<multiprocessing.pool.ThreadPool state=RUN pool_size=100> |
Free Python ThreadPool Course
Download your FREE ThreadPool PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPool API.
Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously
Common Questions
This section lists common questions related to the number of worker threads in the ThreadPool class.
Do you have a question about setting the number of threads?
Let me know in the comments and I will do my best to answer it and add it to this section.
What is a CPU and What is a CPU Core?
A central processing unit or simply “processor” or “CPU” is a chip in the computer that executes instructions.
Traditionally, we had one CPU in the computer, perhaps with a math coprocessor.
- CPU: Central processing unit, a chip within the computer for executing instructions.
A core is another name for a physical CPU for executing instructions.
A computer with multiple CPUs is referred to as having multiple cores.
Similarly, a computer chip that has multiple CPUs within it, is referred to as a multi-core processor.
- Multi-core processor: A physical chip with multiple CPUs or cores.
As such, as developers, the terms “CPUs” and “cores” are used interchangeably. We might even refer to them as “CPU cores”.
Almost all modern computers have multiple cores.
What are Physical CPUs vs Logical CPUs
Modern CPUs typically make use of a technology called hyperthreading.
Hyperthreading does not refer to a program using threads. Instead, it refers to a technology within the CPU cores themselves that allows each physical core or CPU to act as if it were two logical cores or two CPUs.
- Physical Cores: The number of CPU cores provided in the hardware, e.g. the chips.
- Logical Cores: The number of CPU cores after hyperthreading is taken into account.
It provides automatic in-core parallelism that can offer up to a 30% speed-up over CPU cores that do not offer the technology.
As such, when we count CPU cores in a system, we typically count the number of logical CPU cores, not the number of physical CPU cores.
If you know your system uses hyperthreading (it probably does), then you can get the number of physical CPUs in your system by dividing the number of logical CPUs by two.
- Count Physical Cores = Count Logical Cores / 2
What is the Default Number of Threads in the ThreadPool?
The default number of threads in the multiprocessing.pool.ThreadPool class is equal to the number of logical CPU cores in your system.
For example:
- Total Number Worker Threads = Number of Logical CPUs in Your System
Where the number of CPUs in your system is determined by Python and will take hyperthreading into account.
For example if you have two CPU cores each with hyperthreading (which is common), then Python will detect four CPUs in your system.
How Many CPU Cores Do I Have?
There are a number of ways to determine the number of CPU cores in your system.
Some functions include:
For example:
1 2 3 |
... # get the number of logical cpu cores n_cores = multiprocessing.cpu_count() |
You can learn more in the tutorial:
Should The Number of Threads in the Pool Match the Number of CPUs or Cores?
The number of worker threads in the ThreadPool is not related to the number of CPUs or CPU cores in your system.
You can configure the number of worker threads based on the number of tasks you need to execute, the amount of local system resources you have available (e.g. memory), and the limitations of resources you intend to access within your tasks (e.g. connections to remote servers).
How Many Worker Threads Should I Use?
If you have hundreds of tasks, you should probably set the number of threads to be equal to the number of tasks.
If you have thousands of tasks, you should probably cap the number of threads at `00s or 1,000s.
If your application is intended to be executed multiple times in the future, you can test different numbers of threads and compare overall execution time, then choose a number of threads that gives approximately the best performance. You may want to mock the task in these tests with a random sleep operation.
What is the Maximum Number of Worker Threads in the ThreadPool?
There is no maximum number of worker threads in the ThreadPool.
Nevertheless, your system will have an upper limit of the number of threads you can create based on how much main memory (RAM) you have available.
Before you exceed main memory, you will reach a point of diminishing returns in terms of adding new threads and executing more tasks. This is because your operating system must switch between the threads, called context switching. With too many threads active at once, your program may spend more time context switching than actually executing tasks.
A sensible upper limit for many applications is hundreds of threads to perhaps a few thousand threads. More than a few thousand threads on a modern system may result in too much context switching, depending on your system and on the types of tasks that are being executed.
Overwhelmed by the python concurrency APIs?
Find relief, download my FREE Python Concurrency Mind Maps
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python ThreadPool Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- ThreadPool PDF Cheat Sheet
I also recommend specific chapters from the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPool: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
APIs
References
Takeaways
You now know how to configure the number of worker threads in the ThreadPool.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Photo by Jim Stapleton on Unsplash
Do you have any questions?