A thread pool is a group of pre-instantiated, idle threads which stand ready to be given work. These are often preferred over instantiating new threads for each task when there is a large number of (short) tasks to be done rather than a small number of long ones.
Suppose you want do download 1000s of documents from the internet, but only have resources for downloading 50 at a time. The solution is to utilize is a thread pool, spawning a fixed number of threads to download all the URLs from a queue, 50 at a time.
The downside of multiprocessing.dummy.ThreadPool is that in Python 2.x, it is not possible to exit the program with eg. a KeyboardInterrupt before all tasks from the queue have been finished by the threads.
In order to achieve an interruptable thread queue in Python 2.x and 3.x (for use in PDFx), I’ve build this code, inspired by stackoverflow.com/a/7257510. It implements a thread pool which works with Python 2.x and 3.x:
The queue size is similar to the number of threads (see self.tasks = Queue(num_threads)), therefore adding tasks with pool.map(..) and pool.add_task(..) blocks until a new slot in the Queue is available.
When you issue a KeyboardInterrupt by pressing Ctrl+C, the current batch of workers will finish and the program quits with the exception at the pool.map(..) step.
If you have suggestions or feedback, let me know via @metachris