Web applications normally perform various operations behind the scenes which take some time to process such as writing to remote database, logging over network file system or sending emails. Synchronous processing of the slow operations will reduce the responsiveness of the web application and make the user experience not very pleasant. Here I compare two non-blocking approaches using epoll and threading.

Code: surfsnippets/asynchandler

Suppose there is some operation in our web application which slows it down. The question is how to handle this somewhat independent operation asynchronously without blocking the response. I wrote two simple scripts which demonstrate epoll and threading and compared the benchmarks for these two approaches. In our case the slow operation is just sleeping for 1 second.

1 import time
2 
3 def make_log(recv):
4     time.sleep(1)

Blocking Request Handling

Naive approach to request handling is to perform the operations synchronously without noticing that some operations take much more time to complete than the others. Here is the code:

 1 # sync.py
 2 
 3 import os
 4 import socket
 5 import threading
 6 import time
 7 
 8 PORT        = 8080
 9 HOST        = "127.0.0.1"
10 SOCK_FLAGS  = socket.AI_PASSIVE | socket.AI_ADDRCONFIG
11 counter     = 0     # global variable
12 
13 def get_inet_socket(backlog=128):
14     "Blocking socket"
15     res     = socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, SOCK_FLAGS)
16     af, socktype, proto, canonname, sockaddr = res[0]
17     sock    = socket.socket(af, socktype, proto)
18     sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
19     sock.bind(sockaddr)
20     sock.listen(backlog)
21     return sock
22 
23 
24 def make_log(recv):
25     "Perform logging"
26     global counter
27     counter  += 1
28     print "num = %s" % counter
29     print recv
30     time.sleep(1)
31 
32 
33 def main():
34     # Create server socket
35     isock   = get_inet_socket()
36 
37     while True:
38         # Get data from the inet client
39         conn, addr  = isock.accept()
40         recv    = conn.recv(1024)
41 
42         # Blocking request handling
43         make_log(recv)
44 
45         # Respond to the inet client
46         conn.send("Doobie Doo")
47         conn.close()
48 
49     isock.close()
50 
51 if __name__ == "__main__":
52     main()

Non-Blocking Request Handling with Epoll

A better approach is to use epoll system call to process the event. The epoll mechanism is used in a popular Tornado and Nginx web servers. In Tornado inet socket is registered with the epoll and list of callback handlers. When the request comes in, the epoll event is received and processed by the corresponding handler.

In my approach, I already have web server listening to this socket and some of the web servers already use the epoll mechanism. So instead of registering inet socket I use internal unix socket to talk to the epoll thread. The system looks like the following. Initially, two threads are running: main and epoll threads. The main thread listens to incoming requests and sends data to the unix socket registered with the epoll. The epoll thread (or LoggingThread) receives data from the unix socket and asynchronously handles the request (in our case it just takes a nap :) ). Here is the code:

  1 # async_epoll.py
  2 
  3 import os
  4 import errno
  5 import select
  6 import socket
  7 import functools
  8 import threading
  9 import time
 10 
 11 _EPOLLIN    = 0x001
 12 _EPOLLERR   = 0x008
 13 _EPOLLHUP   = 0x010
 14 
 15 PORT        = 8080
 16 HOST        = "127.0.0.1"
 17 TIMEOUT     = 3600
 18 SOCK_FLAGS  = socket.AI_PASSIVE | socket.AI_ADDRCONFIG
 19 EPOLL_FLAGS = _EPOLLIN | _EPOLLERR | _EPOLLHUP
 20 SOCK_NAME   = "/tmp/logger.sock"
 21 counter     = 0     # global variable
 22 
 23 class LoggerThread(threading.Thread):
 24 
 25     def __init__(self):
 26         threading.Thread.__init__(self)
 27 
 28 
 29     def run(self):
 30         sock    = get_server_socket()
 31         ep      = select.epoll()
 32         ep.register(sock.fileno(), EPOLL_FLAGS)         # register socket
 33         handler = functools.partial(conn_ready, sock)   # add handler for the socket
 34 
 35         events      = {}
 36         while True:
 37             event_pairs = ep.poll(TIMEOUT)
 38             events.update(event_pairs)
 39             while events:
 40                 fd, ev = events.popitem()
 41                 try:
 42                     handler(fd, ev)
 43                 except (OSError, IOError), e:
 44                     if e.args[0] == errno.EPIPE:
 45                         pass
 46 
 47 
 48 def handle_connection(conn, address):
 49     "Handles connection"
 50     make_log(conn.recv(1024))
 51 
 52 
 53 def conn_ready(sock, fd, ev):
 54     while True:
 55         try:
 56             conn, address = sock.accept()
 57         except socket.error, e:
 58             if e.args[0] not in (errno.EWOULDBLOCK, errno.EAGAIN):
 59                 raise
 60             return
 61         conn.setblocking(0)
 62         handle_connection(conn, address)
 63 
 64 
 65 # Unix socket
 66 def get_server_socket(backlog=128):
 67     "Server for unix socket which listens for connections"
 68     sock    = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
 69     sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 70     sock.setblocking(0)
 71     try:
 72         os.unlink(SOCK_NAME)    # Clean up socket
 73     except:
 74         pass
 75     sock.bind(SOCK_NAME)
 76     sock.listen(backlog)
 77     return sock
 78 
 79 
 80 def get_client_socket():
 81     "Client for unix socket"
 82     sock    = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
 83     return sock
 84 
 85 
 86 # Inet socket
 87 def get_inet_socket(backlog=128):
 88     "Blocking inet socket"
 89     res     = socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, SOCK_FLAGS)
 90     af, socktype, proto, canonname, sockaddr = res[0]
 91     sock    = socket.socket(af, socktype, proto)
 92     sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 93     sock.bind(sockaddr)
 94     sock.listen(backlog)
 95     return sock
 96 
 97 
 98 def make_log(recv):
 99     "Perform logging"
100     global counter
101     counter  += 1
102     print "counter = %s" % counter
103     print recv
104     time.sleep(1)
105 
106 
107 def main():
108     # Create Logger thread
109     t   = LoggerThread()
110     t.setDaemon(True)
111     t.start()
112 
113     # Create server socket
114     isock   = get_inet_socket()
115 
116     while True:
117         # Get data from the inet client
118         conn, addr  = isock.accept()
119         recv    = conn.recv(1024)
120 
121         # Send received data to socket
122         sock    = get_client_socket()
123         sock.connect(SOCK_NAME)
124         sock.send(recv)
125         sock.close()
126 
127         # Respond to the inet client
128         conn.send("Doobie Doo")
129         conn.close()
130 
131     isock.close()
132 
133     # Wait for the thread
134     t.join()
135 
136 
137 if __name__ == "__main__":
138     main()

Non-Blocking Request Handling with Threads

Another approach for non-blocking request handling is to use threads. The idea is somewhat described in the post “Threading in Django”. When the request comes in, a new thread is created which handles the request. This method is used by Apache web server and requires more memory than for event driven web servers. Here is the code:

 1 # async_thread.py
 2 
 3 import os
 4 import socket
 5 import threading
 6 import time
 7 
 8 PORT        = 8080
 9 HOST        = "127.0.0.1"
10 SOCK_FLAGS  = socket.AI_PASSIVE | socket.AI_ADDRCONFIG
11 counter     = 0     # global variable
12 
13 class LoggerThread(threading.Thread):
14 
15     def __init__(self):
16         threading.Thread.__init__(self)
17         self._recv  = None
18 
19 
20     def set_recv(self, recv):
21         self._recv  = recv
22 
23     def run(self):
24         make_log(self._recv)
25 
26 
27 # Inet socket
28 def get_inet_socket(backlog=128):
29     "Blocking socket"
30     res     = socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, SOCK_FLAGS)
31     af, socktype, proto, canonname, sockaddr = res[0]
32     sock    = socket.socket(af, socktype, proto)
33     sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
34     sock.bind(sockaddr)
35     sock.listen(backlog)
36     return sock
37 
38 
39 def make_log(recv):
40     "Perform logging"
41     global counter
42     counter  += 1
43     print "counter = %s" % counter
44     print recv
45     time.sleep(1)
46 
47 
48 def main():
49     # Create server socket
50     isock   = get_inet_socket()
51 
52     while True:
53         # Get data from the inet client
54         conn, addr  = isock.accept()
55         recv    = conn.recv(1024)
56 
57         # Create Logger thread
58         t   = LoggerThread()
59         t.set_recv(recv)
60         t.setDaemon(True)
61         t.start()
62 
63         # Respond to the inet client
64         conn.send("Doobie Doo")
65         conn.close()
66 
67     isock.close()
68 
69 
70 if __name__ == '__main__':
71     main()

Benchmarks

Now we can do some benchmarks for synchronous, asynchronous with epoll and asynchronous with threads approaches. For benchmarking I used popular Apache Benchmark ab tool. I start one of the scripts in one terminal, like:

[terminal 1]$ python asynch_epoll.py
[terminal 2]$ ab -n 100 -c 10 http://localhost:8080/

The benchmark result for request handling with epoll will look like the following. Here we get 12345 req/sec.

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   0.008 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1000 bytes
HTML transferred:       0 bytes
Requests per second:    12345.68 [#/sec] (mean)
Time per request:       0.810 [ms] (mean)
Time per request:       0.081 [ms] (mean, across all concurrent requests)
Transfer rate:          120.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     0    0   0.2      0       1
Waiting:        0    0   0.1      0       1
Total:          0    1   0.2      1       1

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      1
  99%      1
 100%      1 (longest request)

Performing similar benchmarks for other scripts we can create the table:

  No Handling Synchronous Asynchronous Epoll Asynchronous Threads
Script nohandling.py sync.py async_epoll.py async_thread.py
Req/sec 10000 1 12000 3500

Looking at the table we see that synchronous request handling gives the worst req/sec. The best performance is achieved by asynchronous request handling with epoll. Though in this method the requests are not blocked, there are a few disadvantages: a) the concurrent number of requests is limited by about 130 and b) it normally takes longer to process all the requests. The point a) can be fixed by writing request handler more carefully and reach about 1000 concurrent requests as it is implemented in Tornado. Asynchronous requests with threads gives about 4 times less responsiveness than the method with epoll the all requests are processed much faster and number of concurrent requests can be higher. Performance in asynchronous with epoll method is better than for method without request handling because the later printed the received data in the terminal.

If you need the best performance and don’t care much when the requests get handled then you better go with Asynchronous Epoll method. I you want a reasonable performance and do care when the requests get handled then Asynchronous Threads will be a better approach. In any case, blocking request handling is not a solution.