Scaling Python Servers with Worker Processes and Socket Duplication

Developing servers that scale is usually quite tricky, even more so with Python and the absence of worker threads which can run on multiple cpu cores [1].

A possible solution are worker processes that duplicate the client’s socket, a technique that allows the workers to processes requests and send responses directly to the client socket. This approach is particularly useful for long lasting connections with more than one request per session.

You can skip the introduction to jump directly to the Python section. [Update: With this approach I was not able to cleanly close all sockets. Be sure to check with lsof.]

Basics: TCP Servers

A tcp server binds to a specific local port and starts listening for incoming connections. When a client connects to this port, the server accepts the connection and waits for incoming data. Data usually arrives in chunks, and the server tries to find the end of a request by looking for delimiter characters or by a specified length (which might be indicated in the first few bytes as in protocol buffers).

Often one request contains a number of values, which may have to be deserialized with a protocol such as json, protocol buffers, avro, thrift or any of the other established or self-invented serialization protocols. After deserializing the incoming bytestream, it can be processed.

Finally the server may or may not respond to the client’s request, and/or close to socket connection at any time.


Now, what happens if 10,000 clients try to connect to your server concurrently and start sending requests and, in particular, how will it impact the response time? The C10K Problem is a classic resource which discusses this exact situation and provides a few general approaches, which boil down to:

  1. One thread, one client
  2. One thread, many clients
  3. Build the server code into the kernel

We usually don’t want to do the latter, and it’s a general advise to avoid running thousands of concurrent threads. Therefore the question becomes how to handle many clients within one thread.

Asynchronous Servers

For one thread to manage multiple clients, the sockets have to be handled in an asynchronous fashion. That is, the server provides the operating system with a list of file descriptors, and receives a notification as soon as any socket has an event (read, write, error) ready to process. Operating system calls that provide this functionality include select, poll, epoll and kqueue. This approach makes it possible to develop scalable single-threaded servers, such as Facebook’s tornado webserver (free software, written in Python).

The critical point is processing time per request. If one request takes 1 ms to process and send a response, a single threaded server will have a limit at 1,000 requests per second.

Distributing the Load

There are various approaches for distributing incoming connections in order to reach a higher number of concurrently processed requests.

Load balancers distribute incoming requests across servers, usually proxying all traffic in both directions.

Reverse proxies allow you to run multiple server process on different local ports, and distribute incoming connections across them. This works very well in particular for short lived connections such as HTTP requests. Well known reverse proxies include Nginx, Varnish and Squid. (Wikipedia).

Socket accept preforking is a technique that allows multiple processes to listen on the same port and, in turn, pick up new incoming connections and handle the client sockets independently. This works by having one process open the server socket and then forking itself, which copies all existing file descriptors for the children to use.

  • man fork: “… The child inherits copies of the parent’s set of open file descriptors.”

Socket accept preforking works very well; popular projects that successfully employ this approach includeGUnicorn, Tornado and Apache.

Worker threads process requests in the background while the socket thread can get back to waiting for events from the operating system. They usually listen on a queue from the main thread and receive the client socket’s file descriptor or also the incoming bytestream, process it and send the response back to the client. Developers need to be very careful with locking issues when accessing shared state/variables.

Naturally, worker threads are one of the best explored ways of distributing the work and having the main thread concentrate on the socket events.


Python restricts multiple threads to one Python interpreter instance, thereby forcing multiple threads to share a single cpu. Context switches between the threads take place at latest after 100 Python instructions. Inside Python the controlling mechanism is called the Global Interpreter Lock (GIL) [1].

In a server that means as soon as one worker thread uses the cpu, it steals processing time from the socket handling thread, which makes a traditional worker thread architecture unfeasible. But there is an alternative:worker processes.


[Update: With this approach I was not able to cleanly close all open sockets. Be sure to check with lsof.]

Duplicating Sockets across Processes

Given the ability to pickle a socket, worker processes can do the same thing as worker threads: receive a client’s socket file descriptor and possibly the bytestream, process it, and send a response without requiring a callback to the main socket handler process.

Sockets can be pickled and reconstructed with an almost hidden, undocumented feature of Python’s multiprocessing module: multiprocessing.reduction.


I discovered this module recently through an inline comment in one of the examples in the multiprocessing docs, and wanted to find out more:

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing.reduction
>>> help(multiprocessing.reduction)

Help on module multiprocessing.reduction in multiprocessing:





# Module to allow connection and socket objects to be transferred
# between processes
# multiprocessing/
# Copyright (c) 2006-2008, R Oudkerk --- see COPYING.txt

Not much information, but at least we know where to look: multiprocessing/

And there it is, starting at line 122:

# Functions to be used for pickling/unpickling objects with handles

def reduce_handle(handle):
if Popen.thread_is_spawning():
return (None, Popen.duplicate_for_child(handle), True)
= duplicate(handle)
('reducing handle %d', handle)
return (_get_listener().address, dup_handle, False)

def rebuild_handle(pickled_data):
, handle, inherited = pickled_data
if inherited:
return handle
('rebuilding handle %d', handle)
= Client(address, authkey=current_process().authkey)
.send((handle, os.getpid()))
= recv_handle(conn)
return new_handle

Example Code

Putting it all together, this is how to use the above functions to share sockets with another process in Python:

# Main process
from multiprocessing.reduction import reduce_handle
= reduce_handle(client_socket.fileno())

# Worker process
from multiprocessing.reduction import rebuild_handle
= pipe.recv()
= rebuild_handle(h)
= socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)
.send("hello from the worker process\r\n")


  1. Python, Threads and the Global Interpreter Lock (GIL)
  2. The C10K Problem
  3. Global Interpreter Lock


Thanks to Brian Jones for reading drafts of this post.

Recommended Reading