Last Updated on October 29, 2022
The ThreadPool is a flexible and powerful thread pool for executing ad hoc tasks in an asynchronous manner.
In this tutorial, you will discover a ThreadPool example that you can use as a template for your own project.
Let’s dive in.
How to Scan Ports on a Server One-by-One (slow)
We can connect to other computers by opening a socket, called socket programming.
Opening a socket requires both the name or IP address of the server and a port number on which to connect.
For example, when your web browser opens a web page on python.org, it is opening a socket connection to that server on port 80, then using the HTTP protocol to request and download (GET) an HTML file.
Socket programming or network programming is a lot of fun.
A good first socket programming project is to develop a port scanner.
This is a program that reports all of the open sockets on a given server.
A simple way to implement a port scanner is to loop over all the ports you want to test and attempt to make a socket connection on each. If a connection can be made, we disconnect immediately and report that the port on the server is open.
For example, we know that port 80 is open on python.org, but what other ports might be open?
Historically, having many open ports on a server was a security risk, so it is common to lock down a public-facing server and close all non-essential ports to external traffic. This means scanning public servers will likely yield few open ports in the best case or will deny future access in the worst case if the server thinks you’re trying to break in.
As such, although developing a port scanner is a fun socket programming exercise, we must be careful in how we use it and what servers we scan.
Next, let’s look at how we can open a socket connection on a single port.
Open a Socket Connection on a Port
Python provides socket communication in the socket module.
A socket must first be configured in terms of the type of host address and type of socket we will create, then the configured socket can be connected.
You can learn more about the socket module in Python here:
There are many ways to specify a host address, although perhaps the most common is the IP address (IPv4) or the domain name resolved by DNS. We can configure a socket to expect this type of address via the AF_INET constant.
There are also different socket types, the most common being a TCP or stream type socket and a less reliable UDP type socket. We will attempt to open TCP sockets in this case, as they are more commonly used for services like email, web, FTP, and so on. We can configure our socket for TCP using the SOCK_STREAM constant.
We can create and configure our socket as follows:
1 2 3 |
... # set a timeout of a few seconds sock = socket(AF_INET, SOCK_STREAM) |
We must close our socket once we are finished with it by calling the close() function; for example:
1 2 3 |
... # close the socket sock.close() |
While working with the socket, an exception may be raised for many reasons, such as an invalid address or a failure to connect. We must ensure that the connection is closed regardless, therefore we can automatically close the socket using the context manager; for example:
1 2 3 4 |
... # create and configure the socket with socket(AF_INET, SOCK_STREAM) as sock: # ... |
Next, we can further configure the socket before we open a connection.
Specifically, it is a good idea to set a timeout because attempting to open network connections can be slow. We want to give up connecting and raise an exception if a given number of seconds elapses and we still haven’t connected.
This can be achieved by calling the settimeout() function on the socket. In this case, we will use a somewhat aggressive timeout of 3 seconds.
1 2 3 |
... # set a timeout of a few seconds sock.settimeout(3) |
Finally, we can attempt to make a connection to a server.
This requires a hostname and a port, which we can pair together into a tuple and pass to the connect() function.
For example:
1 2 3 |
... # attempt to connect sock.connect((host, port)) |
If the connection succeeds, we could start sending data to the server and receive it back via this socket using the protocol suggested by the port number. We don’t want to communicate with the server so we will close the connection immediately.
If the connection fails, an exception will be raised indicating that the port is likely not open (or not open to us).
Therefore, we can wrap the attempt to connect in some exception handling.
1 2 3 4 5 6 7 8 |
... # connecting may fail try:     # attempt to connect     sock.connect((host, port))     # a successful connection was made except:     # ignore the failure, the port is closed to us |
Tying this together, the test_port_number() will take a host number and a port will return True if a socket can be opened or False otherwise.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# returns True if a connection can be made, False otherwise def test_port_number(host, port):     # create and configure the socket     with socket(AF_INET, SOCK_STREAM) as sock:         # set a timeout of a few seconds         sock.settimeout(3)         # connecting may fail         try:             # attempt to connect             sock.connect((host, port))             # a successful connection was made             return True         except:             # ignore the failure             return False |
Next, let’s look at how we can use this function we have developed to scan a range of ports.
Scan a Range of Ports on a Server
We can scan a range of ports on a given host.
Many common internet services are provided on ports between 0 and 1024.
The viable range of ports is 0 to 65535, and you can see a list of the most common port numbers and the services that use them in the file /etc/services on POSIX systems.
Wikipedia also has a page that lists the most common port numbers:
We will limit our scanning to the range of 0 to 1024.
To scan a range of ports, we can repeatedly call our test_port_number() function that we developed in the previous section and report any ports that permit a connection as ‘open’.
The port_scan() function below implements this reporting of any open ports that are discovered.
1 2 3 4 5 6 7 |
# scan port numbers on a host def port_scan(host, ports):     print(f'Scanning {host}...')     # scan each port number     for port in ports:         if test_port_number(host, port):             print(f'> {host}:{port} open') |
Finally, we can call this function and specify the host and range of ports.
In this case, we will port scan python.org (out of love for python, not malicious intent).
1 2 3 4 5 6 |
... # define host and port numbers to scan HOST = 'python.org' PORTS = range(1024) # test the ports port_scan(HOST, PORTS) |
We would expect that at the least port 80 would be open for HTTP connections.
Tying this together, the complete example of port scanning a host in Python is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# SuperFastPython.com # scan a range of port numbers on the host one by one from socket import AF_INET from socket import SOCK_STREAM from socket import socket  # returns True if a connection can be made, False otherwise def test_port_number(host, port):     # create and configure the socket     with socket(AF_INET, SOCK_STREAM) as sock:         # set a timeout of a few seconds         sock.settimeout(3)         # connecting may fail         try:             # attempt to connect             sock.connect((host, port))             # a successful connection was made             return True         except:             # ignore the failure             return False  # scan port numbers on a host def port_scan(host, ports):     print(f'Scanning {host}...')     # scan each port number     for port in ports:         if test_port_number(host, port):             print(f'> {host}:{port} open')  # protect the entry point if __name__ == '__main__':     # define host and port numbers to scan     host = 'python.org'     ports = range(1024)     # test the ports     port_scan(host, ports) |
Running the example attempts to make a connection for each port number between 0 and 1023 (one minus 1024) and reports all open ports.
In this case, we can see that port 80 for HTTP is open as expected, and port 443 is also open for HTTPS.
The program works fine, but it is painfully slow.
On my system, it took 235.8 seconds to complete (nearly 4 minutes).
1 2 3 |
Scanning python.org... > python.org:80 open > python.org:443 open |
Next, let’s explore how we might update the example to check ports concurrently using the ThreadPool.
Run loops using all CPUs, download your FREE book to learn how.
How to Scan Ports Concurrently (fast)
The program for port scanning a server can be adapted to use the ThreadPool with very little change.
The test_port_number() function was already called separately for each port. This can be performed in a separate thread so each port is tested concurrently.
We want to report port numbers in numerical order. This can be achieved by submitting the tasks to the thread pool using the map() function and then iterating the True/False results returned for each port number.
Firstly, we can create the thread pool with one thread per port to be tested.
1 2 3 4 |
... # create the thread pool with ThreadPool(len(ports)) as pool: # ... |
We can issue the tasks to the ThreadPool using the map() method and then iterate the True/False results returned for each port number.
The problem is, that the map() method only supports target functions that take a single argument.
Therefore, we must use the starmap() method instead.
We can prepare the iterable of arguments for each call to the test_port_number() function using a list comprehension, then call starmap() directly, which will return an iterable of return values once all tasks are complete.
1 2 3 4 5 |
... # prepare arguments for starmap args = [(host,p) for p in ports] # dispatch all tasks results = pool.starmap(test_port_number, args) |
We can then iterate over the return values and report the results.
The problem is, that we want to report the return value (open True or False) along with the port number.
This can be achieved using the zip() built-in function which can traverse two or more iterables at once and yield a value from each. In this case, we can zip() our return values and port numbers iterables.
1 2 3 4 5 |
... # report results in order for port,is_open in zip(ports,results): Â Â Â Â if is_open: Â Â Â Â Â Â Â Â print(f'> {host}:{port} open') |
Tying this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# SuperFastPython.com # scan a range of port numbers on a host concurrently from socket import AF_INET from socket import SOCK_STREAM from socket import socket from multiprocessing.pool import ThreadPool  # returns True if a connection can be made, False otherwise def test_port_number(host, port):     # create and configure the socket     with socket(AF_INET, SOCK_STREAM) as sock:         # set a timeout of a few seconds         sock.settimeout(3)         # connecting may fail         try:             # attempt to connect             sock.connect((host, port))             # a successful connection was made             return True         except:             # ignore the failure             return False  # scan port numbers on a host def port_scan(host, ports):     print(f'Scanning {host}...')     # create the thread pool     with ThreadPool(len(ports)) as pool:         # prepare the arguments         args = [(host,port) for port in ports]         # dispatch all tasks         results = pool.starmap(test_port_number, args)         # report results in order         for port,is_open in zip(ports,results):             if is_open:                 print(f'> {host}:{port} open')  # protect the entry point if __name__ == '__main__':     # define host and port numbers to scan     host = 'python.org'     ports = range(1024)     # test the ports     port_scan(host, ports) |
Running the program attempts to open a socket connection for all ports in the range 0 and 1023 and reports ports 80 and 443 open as before.
In this case, the program is dramatically faster.
On my system, it completed in about 3.1 seconds, compared to the 235.8 seconds for the serial case, which is about 76 times faster.
1 2 3 |
Scanning python.org... > python.org:80 open > python.org:443 open |
Further Reading
This section provides additional resources that you may find helpful.
Books
- Python ThreadPool Jump-Start, Jason Brownlee (my book!)
- Threading API Interview Questions
- ThreadPool PDF Cheat Sheet
I also recommend specific chapters from the following books:
- Python Cookbook, David Beazley and Brian Jones, 2013.
- See: Chapter 12: Concurrency
- Effective Python, Brett Slatkin, 2019.
- See: Chapter 7: Concurrency and Parallelism
- Python in a Nutshell, Alex Martelli, et al., 2017.
- See: Chapter: 14: Threads and Processes
Guides
- Python ThreadPool: The Complete Guide
- Python Multiprocessing Pool: The Complete Guide
- Python ThreadPoolExecutor: The Complete Guide
- Python Threading: The Complete Guide
APIs
References
Free Python ThreadPool Course
Download your FREE ThreadPool PDF cheat sheet and get BONUS access to my free 7-day crash course on the ThreadPool API.
Discover how to use the ThreadPool including how to configure the number of worker threads and how to execute tasks asynchronously
Takeaways
You now know how to download files concurrently with this ThreadPool example.
Do you have any questions about this example?
Ask your question in the comments below and I will do my best to answer.
Photo by Jainam Mehta on Unsplash
Do you have any questions?