Serving small static files: which server to use ?

Update 1 (Mar 29, 2011): Logging and compression now disabled for all servers
Update 2 (Apr 14, 2011): Added results for Nginx 1.0

Introduction

The goal of this benchmark is to compare several web server or caching server regarding their respective ability and performance to server small static files. The servers are optimized, contrary to my previous benchmark, where each server was tested using its default settings. I consider in this benchmark only the most performant open source servers, namely Varnish Cache, Nginx, Lighttpd, Apache Traffic Server as well as G-WAN (free, but not open source), as it was the clear winner of the previous test.

Setup

The following version of the software are used for this benchmark:

  • Nginx: 0.7.67-3ubuntu1 (64 bit)
  • Varnish:  2.1.3-7ubuntu0.1 (64 bit)
  • G-WAN: 2.1.20 (32 bit)
  • Lighttpd: 1.4.26-3ubuntu2 (64 bit)
  • Apache Traffic Server: 2.1.7-unstable (64 bit)

All tests are performed on an ASUS U30JC (Intel Core i3 – 370M @ 2.4 Ghz, Hard drive 5400 rpm, Memory: 4GB DDR3 1066MHz) running Ubuntu 10.10 64 bit (kernel 2.6.35).

Benchmark setup

  • HTTP Keep-Alives: enabled
  • TCP/IP settings: OS default
  • Server settings: default
  • Concurrency: from 0 to 1’000, step 10
  • Requests: 1’000’000

The following file of 100 byte is used as static content: /var/www/100.html

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I had to increase the local port range (because of the TIME_WAIT status of the TCP ports), so I’ve edited /etc/sysctl.conf:

net.ipv4.ip_local_port_range = 1024 65535

Disclaimer

Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client itself, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the TCP/IP stack instead of the server itself).

In this benchmark, every server is tested using optimized settings. However, the OS has not been optimized.  If you have comments, improvements, ideas, tips regarding the optimization of a server, please feel free to contact me, I’m always open to improve myself and to learn new things.

Client

The client (available here: http://gwan.ch/source/ab.c.txt) relies on ApacheBench (ab). The client as well as the web server tested are hosted on the same computer.

I’ve evaluated Funkload, but even in a distributed setting, it was not able to saturate the servers. Moreover, I’ve also considered using httperf, but because I had to wait 60 seconds between 2 tests (due to the TIME_WAIT status of the TCP ports), I finally used ApacheBench. Using the latter is also interesting to compare the performance of a server with its default settings or with optimized setings.

Varnish

Configuration:

Based on advices from a Varnish developer and from the Varnish web site, I use the following configuration:

thread_pool_add_delay = 2
thread_pools = 4
thread_pool_min = 200
thread_pool_max = 4000
cli_timeout = 25
session_linger = 100
malloc = 1G

The relevant part of /etc/default/varnish:

DAEMON_OPTS="-a :6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -p thread_pool_add_delay=2 \
             -p thread_pools=4 \
             -p thread_pool_min=200 \
             -p thread_pool_max=4000 \
             -p cli_timeout=25 \
             -p session_linger=100 \
             -s malloc,1G"

Finally, as my hard drive is quite slow, I mounted the varnish folder as a tmpfs:

sudo mount -t tmpfs -o size=512M tmpfs /var/lib/varnish

Results:

Let’s start with the output of varnishstat -1:

client_conn            504312       359.20 Client connections accepted
client_drop                 0         0.00 Connection dropped, no sess/wrk
client_req           20443932     14561.21 Client requests received
cache_hit            20443919     14561.20 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss                 13         0.01 Cache misses
backend_conn               12         0.01 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail                0         0.00 Backend conn. failures
backend_reuse               1         0.00 Backend conn. reuses
backend_toolate            11         0.01 Backend conn. was closed
backend_recycle            13         0.01 Backend conn. recycles
backend_unused              0         0.00 Backend conn. unused
fetch_head                  0         0.00 Fetch head
fetch_length               13         0.01 Fetch with Length
fetch_chunked               0         0.00 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                 0         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed                0         0.00 Fetch failed
n_sess_mem               1689          .   N struct sess_mem
n_sess                    682          .   N struct sess
n_object                    0          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore              986          .   N struct objectcore
n_objecthead              987          .   N struct objecthead
n_smf                       0          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 0          .   N large free smf
n_vbe_conn                  1          .   N struct vbe_conn
n_wrk                     986          .   N worker threads
n_wrk_create              986         0.70 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max                   0         0.00 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow           2710         1.93 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   1          .   N backends
n_expired                  13          .   N expired objects
n_lru_nuked                 0          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved               554          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                  5622         4.00 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite           20394256     14525.82 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                 504312       359.20 Total Sessions
s_req                20443932     14561.21 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                      0         0.00 Total pass
s_fetch                    13         0.01 Total fetch
s_hdrbytes         5971552835   4253242.76 Total header bytes
s_bodybytes        2044393490   1456120.72 Total body bytes
sess_closed            490612       349.44 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger          20443932     14561.21 Session Linger
sess_herd               14205        10.12 Session herd
shm_records         696623209    496170.38 SHM records
shm_writes           21973040     15650.31 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont               234697       167.16 SHM MTX contention
shm_cycles                201         0.14 SHM cycles through buffer
sm_nreq                     0         0.00 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree                    0          .   bytes free
sma_nreq                   26         0.02 SMA allocator requests
sma_nobj                    0          .   SMA outstanding allocations
sma_nbytes                  0          .   SMA outstanding bytes
sma_balloc              11121          .   SMA bytes allocated
sma_bfree               11121          .   SMA bytes free
sms_nreq                    0         0.00 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc                  0          .   SMS bytes allocated
sms_bfree                   0          .   SMS bytes freed
backend_req                13         0.01 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock           20417076     14542.08 HCB Lookups without lock
hcb_lock                    3         0.00 HCB Lookups with lock
hcb_insert                  3         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)
accept_fail                 0         0.00 Accept failures
client_drop_late            0         0.00 Connection dropped late
uptime                   1404         1.00 Client uptime

Nginx

Configuration:

The main optimization for Nginx is to adapt the number of worker processes to the number of cores of your server. In my case, I have a dual core processor (i.e. seen as 4 virtual cores), and therefore I changed the value of worker_processes. Based on the useful comments of Igor Sysoev, I also turned off logging and compression, and added open_file_cache:

File /etc/nginx/nginx.conf

user www-data;
worker_processes  2; # or 4
error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;
events {
    worker_connections  1024;
    # multi_accept on;
}
http {
    include       /etc/nginx/mime.types;
    access_log	off;
    open_file_cache max=1000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    sendfile        off;
    #tcp_nopush     on;
    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;
    gzip  off;
    gzip_disable "MSIE [1-6]\.(?!.*SV1)";
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

File /etc/nginx/sites-enabled/default

server {
        listen   80; ## listen for ipv4
        server_name  localhost;
        access_log  off;
        location / {
                root   /var/www;
                index  index.html index.htm;
        }
}

Nginx 1.0 was compiled by hand with the following configuration:

./configure --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-client-body-temp-path=/var/lib/nginx/
body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-log-path=/var/log/nginx/access.log --http-proxy-temp-path=/var/lib/nginx
/proxy --lock-path=/var/lock/nginx.lock --pid-path=/var/run/nginx.pid --with-ipv6 --without-http_charset_module --without-http_ssi_m
odule --without-http_userid_module --without-http_autoindex_module --without-http_rewrite_module --without-http_limit_zone_module --
without-http_limit_req_module

Results:

Let’s start with 2 worker processes on Nginx 0.7.67

Now with 4 worker processes on Nginx 0.7.67:

Let’s compare with 2 worker processes on Nginx 1.0

Now with 4 worker processes on Nginx 1.0:


Lighttpd

Configuration:

The optimizations for Lighttpd are based on the official documentation as well as a blog entry.

File /etc/lighttpd/lighttpd.conf

server.document-root       = "/var/www/"
server.upload-dirs = ( "/var/cache/lighttpd/uploads" )
server.errorlog            = "/var/log/lighttpd/error.log"
index-file.names           = ( "index.php", "index.html",
                               "index.htm", "default.htm",
                               "index.lighttpd.html" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )
server.pid-file            = "/var/run/lighttpd.pid"
dir-listing.encoding        = "utf-8"
server.dir-listing          = "enable"
server.username            = "www-data"
server.groupname           = "www-data"
compress.cache-dir          = "/var/cache/lighttpd/compress/"
compress.filetype           = ("text/plain", "text/html", "application/x-javascript", "text/css")
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
include_shell "/usr/share/lighttpd/include-conf-enabled.pl"
# Performance tuning
server.max-fds = 10000
server.event-handler = "linux-sysepoll"
server.network-backend = "linux-sendfile"
server.stat-cache-engine = "fam"
server.use-noatime = "enable"
server.max-worker = 2 # or 4

Results:

Let’s start with 2 worker processes:


Now with 4 worker processes:

Apache Traffic Server

Configuration:

According to the documentation, Traffic Server does not require special optimizations.

File /usr/local/etc/trafficserver/storage.config

var/trafficserver 256M

The file /usr/local/etc/trafficserver/records.config contains a lot of properties. Just check that the following entries are configured to support the reverse proxy mode, and disable logging:

CONFIG proxy.config.reverse_proxy.enabled INT 1
CONFIG proxy.config.proxy_name STRING nico-laptop # <your hostname>
CONFIG proxy.config.log.logging_enabled INT 0

Let’s configure the mapping in the file /usr/local/etc/trafficserver/remap.config:

map          http://127.0.0.1:8080/      http://127.0.0.1/

Nginx is used as the backend server on port 80/tcp.

Results:

G-WAN

Configuration:

Except running G-WAN with root priviledges, nothing has to be done to further optimize the performance:

sudo ./gwan

Results:

Discussion

Minimum RPS


Average RPS

Maximum RPS

CPU Usage

Memory Usage


Conclusion

G-WAN seems again to perform a lot better than the other servers.  Nginx always performs slightly better than Lighttpd, while Apache Traffic server is very similar to Lighttpd in term of performance. Finally, Varnish Cache serves only half of the requests compared to all others. Surprisingly , there are quite few differences between the results with optimized settings and those with the default settings.

Regarding the resources used by each server, Nginx is the winner in term of memory usage, as the amount of memory does not increases with the number of concurrent clients. G-WAN requires 2 times less CPU than the other servers.

Again, keep in mind that this benchmark compares only the servers locally (no networking is involved), and therefore the results might be misleading.


About these ads

126 thoughts on “Serving small static files: which server to use ?

  1. Pingback: Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN « Spoot!

  2. Pingback: Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN « Spoot!

  3. Hi nbonvin,

    Great benchmark write up!
    Regarding the G-WAN resource usage graph, concurrent clients >600, CPU usage drops from 20% down to around what looks to be 1% or 0%, is this correct?
    If so, G-WAN is one impressive little web/app server, all that’s left for improvement is lowering the memory usage and then it will be perfect. :-)

    Regards,
    Alex.

    • Hi,
      Regarding the resources usage, I have monitored the servers 10 seconds before and after the benchmark. G-WAN has completed the benchmark in 583 seconds; so, from 583 to 593, the CPU usage is very close to zero.
      On the resources usage plots, you can therefore also see the time spent per server to serve the same amount of requests.

      Edit: the label on the x-axis should be time (in seconds), and not concurrent users. I’ll fix it asap. This is fixed.

  4. Hi nbnovin,

    Maybe you forget to mentioned to your conclusion section, that G-WAN just need “sudo ./gwan” to get the highest RPS, the lowest CPU usage. Compared to another (eg: varnish), it is the easiest.

    -shahih

  5. Thank you for the hard work and for the quality of the test (your tests put to shame the ones I did in the past – both in completeness and accuracy – so I can only congratulate you).

    There is just a typo in the legend of the last charts’ horizontal axis: it should be “Time (seconds)” instead of “Concurrent clients”. This should be corrected to prevent confusion.

    I sincerely believe that everybody will benefit from this comparative benchmark, end-users and authors. As one of the authors, when I see how Nginx is superbly sparing memory, I certainly know where I should look at improving my work for the next version!

    One picture is worth one thousand words.

    • Thanks for your comment.
      Yes, there is a typo on the last 2 plots. I’ll fix it asap. This is fixed.

  6. Could you
    1) disable gzip in nginx test
    2) and disable sendfile in nginx test ?
    BTW, do all these servers write access log ?
    If you need becnhmark edition configuration, then nginx has also open_file cache
    open_file_cache max=1000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    to save 3 syscalls per frequent requests.

    • Thanks a lot for your comment and the optimization tips. I’ll redo the tests in the next days.
      Good point, I have to check if every server has the same log verbosity, as it may play a role !
      Just one remark: the goal here is not to achieve the best performance at any price, the setup should also be usable in practise (e.g. in a CDN). Therefore, logs are useful, as well as gzip. But, I should make sure that each server provides the same functionnality.

      Thanks for your input !

        • nginx version: nginx/0.7.67
          TLS SNI support enabled
          configure arguments: –conf-path=/etc/nginx/nginx.conf –error-log-path=/var/log/nginx/error.log –http-client-body-temp-path=/var/lib/nginx/body –http-fastcgi-temp-path=/var/lib/nginx/fastcgi –http-log-path=/var/log/nginx/access.log –http-proxy-temp-path=/var/lib/nginx/proxy –lock-path=/var/lock/nginx.lock –pid-path=/var/run/nginx.pid –with-debug –with-http_dav_module –with-http_flv_module –with-http_geoip_module –with-http_gzip_static_module –with-http_realip_module –with-http_stub_status_module –with-http_ssl_module –with-http_sub_module –with-ipv6 –with-mail –with-mail_ssl_module –add-module=/build/buildd/nginx-0.7.67/modules/nginx-upstream-fair

          • Could you rebuild nginx using
            ./configure
            –conf-path=/etc/nginx/nginx.conf
            –error-log-path=/var/log/nginx/error.log
            –http-client-body-temp-path=/var/lib/nginx/body
            –http-fastcgi-temp-path=/var/lib/nginx/fastcgi
            –http-log-path=/var/log/nginx/access.log
            –http-proxy-temp-path=/var/lib/nginx/proxy
            –lock-path=/var/lock/nginx.lock
            –pid-path=/var/run/nginx.pid
            –with-ipv6
            –without-http_charset_module
            –without-http_ssi_module
            –without-http_userid_module
            –without-http_autoindex_module
            –without-http_rewrite_module
            –without-http_limit_zone_module
            –without-http_limit_req_module

    • This microbenchmark has not any relation to real life CDN usage: tens thousands connections, a lot of different size files, which aggregate size does not fit in a computer physical memory, slow clients.

    • Igor,

      Even with many slow clients, low CPU usage and small latency are key points.

      This independent review of several serveral server architectures has the merit of showing that there is no perfect solution today. And it shows what could (should?) be done in each solution to head in the right direction (ideally, doing everything right).

      I find this work more as an inspiring invitation for all of us to perform ourselves (rather than as a sterile critic), and I am grateful Nicolas had the idea of doing it, and the courage to execute all those tests (to make progress, we need others to tell us were we lack).

      “Computing Science would benefit from more frequent analysis, critique, particularly self-critique. After all, thorough self-critique is the hallmark of any subject claiming to be a science.”
      (Niklaus Wirth, “Good Ideas, Through the Looking Glass”, 2005)

      • I agree that micro-benchmarks are useful, but only if you really understand what you are testing and how you are testing.

        • Igor,

          The test bed was the same for all servers. So any pitfall injected by the test was also the same for all. Localhost waves network bandwidth and latency issues (the botleneck is then the kernel), and this helps to compare each server’s user-mode code.

          Now, if you know a better testing tool, Nicolas has already made this request several times (like me, including to PHK, the Varnish author) and so far nobody pointed any capable (Perl & Co. are not fast enough) client program able to test a range of concurrency without requiring days of benchmarks (i.e.:HTTPerf lingering) and myriads of client machines.

          This industry needs a decent client test that we can all agree to use in order to let us ALL make progress.

          Pierre.

          • Pierre,

            I do not know a good testing tool, so this is one of reasons why I do not run any benchmarks. As to testing via localhost I believe this is useless thing.

            Pierre, I’ve found only today that you are G-WAN author. So I’ve downloaded G-WAN and run it using strace to see syscalls the server makes and found that there are some surplus syscalls, for example, 2 stat64()s and 2 empty read()s:

            [pid 2571] stat64(“/home/is/gwan/0.0.0.0_8080/#0.0.0.0/www/100.html”, {st_mode=S_IFREG|0644, st_size=100, …}) = 0
            [pid 2571] stat64(“/home/is/gwan/0.0.0.0_8080/#0.0.0.0/www/100.html”, {st_mode=S_IFREG|0644, st_size=100, …}) = 0
            [pid 2571] open(“/home/is/gwan/0.0.0.0_8080/#0.0.0.0/www/100.html”, O_RDONLY) = 9
            [pid 2571] fstat64(9, {st_mode=S_IFREG|0644, st_size=100, …}) = 0
            [pid 2571] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xfffffffff7734000
            [pid 2571] read(9, “XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”…, 4096) = 100
            [pid 2571] read(9, “”…, 4096) = 0
            [pid 2571] read(9, “”…, 4096) = 0
            [pid 2571] close(9) = 0
            [pid 2571] munmap(0xf7734000, 4096) = 0

            mmap()/unmap() seems also surplus to me. There are also surplus syscalls in other place.
            For the same request nginx does lesser syscalls, so if G-WAN handles more requests than nginx, it seems that G-WAN’s userland code does almost nothing (BTW, it can not return 304 response). You say that G-WAN configuration is very simple – it’s just files tree. Could you tell how to disable keepalive, sendfile or gzipping inside the tree ? How to restrict access to some files ? How to define custom index.html file, say “index.htm” ?

            Excuse me, but I believe the current version is good for benchmarks, but not for production.

            • Igor,

              We discussed the point by email – but you seem not even to remember me. And I use to praise Nginx on G-WAN’s forum:

              http://forum.gwan.com/index.php?p=/discussion/comment/1035/#Comment_1035
              http://forum.gwan.com/index.php?p=/discussion/comment/1272/#Comment_1272
              http://forum.gwan.com/index.php?p=/discussion/comment/1372/#Comment_1372

              So excuse me if I see your critics against G-WAN as a bit lean.

              Regarding the syscalls, it is obvious that the difference between G-WAN and Nginx is not to be found there (we all use the SAME syscalls).

              G-WAN is faster than Nginx only because G-WAN’s user-mode code is faster.

              Now, regarding the “feature” critic, I could also claim that G-WAN does more than Nginx because it supports native C scripts (or run any general-purpose C source code file [not G-WAN related], or link with static or dynamic libraries, or in-memory GIF I/O, Area/Bar/Dot/Line/Pie/Ring Charts, frame-buffer primitives, JSON (de)serialization, on-the-fly CSS/JS/HTML reduction, CSS Data URIs, compression, crypto, HW and SW random numbers, hashing, checksums, etc.), or because everything is made automatic instead of forcing users to learn proprietary configuration files syntax (search Google to see how many your choice for disabling log files has disturbed).

              As you can see, if we compare Nginx and G-WAN features, Nginx might not win. This is because G-WAN is an “Application server” while Nginx is a “Web server”.

              You can’t disable HTTP Keep-Alives in G-WAN. If you don’t want Keep-Alives then run ApachBench without the -k option (and G-WAN will *automatically* make the switch).

              Error 304 is indeed supported by G-WAN’s handling of “If Modified” and “If None Match” conditional requests.

              Gzip (and deflate) are supported by-default on file < 100 bytes in size. It just does not make sense to compress smaller files, so why bother users?

              HTTP BASIC and DIGEST (Nginx does not support both…) Authorizations have been added to G-WAN a few months ago but not tested yet (you could read the timeline to save me from all this tedious explanations).

              You can't define a custom index (other than by using a C script) but I have heard that Varnish explained that C source code was the best way to write configuration files. "index.html"-only does not seem a dramatic lack of features to me – nor any of the other features that you (incorrectly) claimed G-WAN does not support.

              Excuse me, but I believe the current version is good for benchmarks, but not for production.

              I am not surprised that we have different opinions. They obviously lead to a different idea of what matters in a server.

              And, as a result, G-WAN is much faster than Nginx.

              Pierre.

              • Pierre,

                sorry, I did not remember you. I found in my archive that we discussed in 2009. In 2003-2006 I remembered the most people I discussed, but now their number became much larger and I remember only small part of them.

                As to syscalls, I meant that nginx does lesser syscalls than G-WAN,
                for example, nginx does not set SO_KEEPALIVE at all and sets TCP_NODELAY
                and SO_LINGER only on some conditions.
                BTW, with “setsockopt(SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}” for
                non-keepalive connection you have chance to drop small last part of large file while sending to a slow client.

                As to features, I show only some of them. Usually system administrators want to change a lot of them and if you want to change them in G-WAN, you have to write C scripts. Although Varnish developers and you think that C is the best way to configure, I believe the most system administrators do not agree with you, however, I do not want to convince you. If you think that directory tree with C scripts is better than proprietary configuration file syntax – OK.

                nginx supports HTTP Basic Authentication (ngx_http_auth_basic_module) from the first public version. And does not support Digest.

                As to 304:

                $ cat q
                GET /100.html HTTP/1.0
                If-Modified-Since: Thu, 31 Mar 2011 19:12:00 GMT

                $ nc localhost 8080 <q
                HTTP/1.0 200 OK
                Date: Fri, 01 Apr 2011 20:12:44 GMT
                Server: G-WAN/2.1.20
                Last-Modified: Thu, 31 Mar 2011 19:12:00 GMT
                ETag: 1daa9b9d-4d94d200-63
                Vary: Accept-Encoding
                Accept-Ranges: bytes
                Cache-Control: public
                Expires: Sat, 02 Apr 2011 20:12:32 GMT
                Content-Type: text/html
                Content-Length: 99
                Connection: close

                XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

                • Igor,

                  Again, your analysis is too superficial.

                  with “setsockopt(SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}” for non-keepalive connection you have chance to drop small last part of large file while sending to a slow client.

                  That’s why G-WAN does actively polls connections before closing. Just to make sure that the other side has finished.

                  Although Varnish developers and you think that C is the best way to configure, I believe the most system administrators do not agree with you.

                  System administrators are not programmers, so it is natural for them to use tools that offer a long list of push-button-features, like a Web server’s configuration files – just because they would not be able to write the code to do what they want.

                  G-WAN, on another hand, is an Application server. A tool for C programmers. And programmers sincerely HATE to learn sterile things like arbitrary (and proprietary) conventions (each Web server seems to take pride at making as obscure and as difficult to understand its syntax as opposed to all others, sometimes to sell training and consulting services).

                  But G-WAN goes a step higher, trying hard to make everything *automatic* when this is possible (whether this is for HTTP options or programming tasks).

                  A good example is how G-WAN is turning blocking BSD socket calls into asynchronous calls *transparently*. It makes writting Handlers (the equivalent of Nginx modules) immensely easier. This also lets G-WAN re-use existing network libraries *without any change*.

                  G-WAN aims at making things as simple as possible because I have no goal to sell training nor (basic usage) consulting. Programmers already have a job. They don’t need to fight against pointless complexity in the tools they have to use in their daily activities.

                  nginx supports HTTP Basic Authentication (ngx_http_auth_basic_module) but not Digest.

                  That’s precisely what I wrote. Except that this is “HTTP Authorization” rather than ‘Authentication ‘. The diference matters because ‘Authorization’ is (much) weaker from a cryptographic point of view. In G-WAN’s manual I wrote that I believed that was the reason why you did not implement Digest (which was just a way for MSFT and Verisign, the Digest standard authors, to blatantly promote the SSL certificates scam that they have built and that they sell).

                  As to 304:

                  This is because you test a tiny file. G-WAN estimates that processing HEAD requests or conditional requests for tiny files does not make sense (the payload stays within a few TCP packets) so it just dumps the whole file, after the HTTP headers.

                  This test will work as expected on a larger file, like:
                  http://gwan.com/archives/gwan_linux.pdf

                  Igor, I can restore HEAD requests and conditional requests in G-WAN for tiny files, but these extra tests will NOT make G-WAN TWICE SLOWER (like Nginx is).

                  G-WAN is faster than Nginx only because its *request parsing* and *response building* code is much faster than Nginx’s.

                  And this is precisely what a tiny file benchmark on a modest laptop on localhost illustrates.

                  There is no strong CPU to hide user-mode code inefficiency, nor high-performances NICs and switches to make network latency/bandwidth magically disappear in a specific benchmark made for PR headlines. With localhost, the only bottlenecks are the OS kernel and the Web server user-mode code (40/60 for the slow servers, 90/10 or less for G-WAN, see by yourself).

                  Everybody can duplicate such a test on localhost (without tenths of thousands of US dollars AND lengthly NICs tuning explanations) – and this is precisely the value of this comparison.

                  • I think G-WAN won because the framework is simple, just accepting request + only parsing simple GET request + dispatch to handler + dispatch to servlet. Compare with other that must deal with plugin-in approach and processing more complext http request. Also G-WAN aggressively cache everything, thats why the memory consumption is increasing, its problematic if someone flooding it with long http request. I think its not apple-to-apple.

                    • Atmo,

                      Please stop trashing this blog. You have posted 5 “replies” in a row to spread FUD (Fear, Uncertainty and Doubt) about G-WAN after your account at forum.gwan.com has been closed for the same reason.

                      If G-WAN was caching the same file repeatedly at each request then there would be no benefit in the first place. Stop saying stupid things, please.

                      Pierre.

    • Hi,
      Following your advices, I’ve disabled logging and gzip, and added open_file cache. This makes a huge difference :-)
      For a low concurreny (< 300 concurrent clients), Nginx can now handle 1.5 to 2 times more requests per second. Impressive !

      Again, thanks a lot for your input and your time !

  7. Interesting. Couple of questions:

    1) Are all servers setup as proxies? I’m guessing it’s not?

    2) Are all doing the same things as far as disk I/O is concerned? As someone else already suggested, logging and log verbosity could be a significant factor.

    3) How is “ab” not becoming the bottleneck? When I run your test, my “ab” process on the box always ends up at 100%. It’s quite surprising that this still allows for such huge differences in throughput.

    4) CPU usage (resources) is that excluding the CPU used by the “ab” process? Like I mentioned, one entire CPU is “lost” to the ab process when I run this test.

    5) I ran your test, similar configs except I turned logging off (my disk is slow), using aAMD Phenom(tm) II X4 940 and got these numbers on the last test (1000 conns):

    ##### 1000,45924,46485,46920 #####
    Score:4858971,4966171,5122965 Time:1066 second(s) [00:17:46]

    (contrary to your tests, I don’t see a whole lot difference between say 200 conns and 1000 conns).

    6) Latency. This is probably more interesting than “raw” throughput, do you have any information about latency when running your tests? I used a souped up version of http_load, and running at around 48,000 QPS with 1,000 client, I measure an average latency from Apache TS of 0.16ms.

    I suspect one reason the differences are so huge is because each run is fairly short (a few seconds), and the ramp-up time creating connections gets more and more noticeable. Usually when I benchmark something, I let it run for at least 60s, preferably 5 minutes. That would be quite a pain with this test though :).

    As you will also find out, each server will need slight tweaking for their needs. I understand you want to compare out of the box experience, which is fine, but out-of-the-box means different things for different servers. As an example, there are fairly verbose Via: strings generated by Apache Traffic Server by default, to help debugging and analysis.

    Cheers,

    – Leif

    • Hi,

      First, thanks for your detailed comment :-)
      Here are some answers to your questions:

      1) Not all servers are proxies, so I cannot configure them exactly in the same way. But, when a server is configured in proxy mode, only a few amount of requests hit the upstream server. So, I think the comparison is fair.

      2) Yes, I will redo all tests, and make sure that:
      – they all use the same level of log verbosity (i.e. I will probably disable logging)
      – they do not use gzip (as G-WAN and lighttpd do not compress files of 100 bytes by default)

      3) ab never ate 100% of CPU in my tests …

      4) The resources usage (CPU and Memory) are only for the processes of the server that is tested, not the whole system

      5) I’m not sure to understand this point. In my tests, the performance of almost all servers is quite constant between 200 and 1000 concurrent users

      6) Yes, I should definitely provide this important metric in my next benchmarks !

      Regarding the duration of the benchmark, yes, it would probably make sense to let it run for a longer period, but as you mentionned, it will be really painful … maybe in the future :-)

      Actually, in this benchmark, the servers are running with optimized settings, even if some of them are maybe not yet optimized enough :-) But for sure, I need to turn off logging and gzip to all of them.

      BTW, do you have any tips for configuring Traffic Server, as it seems that you have som experience with it ?

      Thanks !

    • Hi,
      Without logging, Traffic server is now almost as capable as Lighttpd. :-)
      Again, thanks a lot for your comments !

      • Oops, I guess I should have read your replies before commenting on this again :). Thanks for the efforts, as you can tell from all the comments, people are very interested in these things of comparisons.

        Cheers,

        — leif

  8. woops posted a request for this today and its there already

    anyhow, another good bench would be non-static serving such as php (e.g. a default wordpress install with 3 posts or so)

    • And the author violate the LGPL license by not disclose it in their documentation, he try do hide the fact. I feel same with you, not very secure using the closed source, the speed of open source solution is already on accepted range, before it saturated the bandwidth/cost.

      • Atmo,

        This accusation is false and you know it because I already responded to it on forum.gwan.com several months ago:

        http://forum.gwan.com/index.php?p=/discussion/comment/1037/#Comment_1037

        Therefore, if you post this same erroneous accusation here, this is only to trash the reputation of G-WAN.

        You have posted 5 “replies” in a row on this blog to spread FUD (Fear, Uncertainty and Doubt) about G-WAN after your account at forum.gwan.com has been closed for the same reason.

        Please stop. What you are doing is called “libel” and unless you are rich enough to buy the “Justice” dept., this is severely punished under the law.

        Pierre.

  9. Thanks alot for doing these tests, very interesting :)

    One question: how come nginx basically have a flat memory usage, whereas all the others have a linearly rising memory usage? Is this indicative of some sort of memory leak, or how should one interprete that, or is it simply a function of a steadily rising number of connected clients? I mean, would the memory usage keep rising if you let the test run longer? If that happened it could be very bad for a production webserver :)

    Also, if I might ask the authors completely subjective opinion: based on your testing, which of these webservers do you feel had the best combination of performance and ease of use?

    • I can’t say how nginx manages to have an almost flat memory usage, but I can explain why others use more memory as concurrency grows.

      The more you have concurrent connections, the more memory you have to allocate to store the context of each connection (client IP address, protocol version, HTTP header values… all this information is stored to answer each client request, like conditional requests based on data or time ranges, or the ones that ask a given encoding).

      This is not a memory leak issue because memory is re-used by other connections if the concurrency level stays constant (or decreases). In the test above, as concurrency grows constantly, so does the memory usage.

      Usually, web servers free this memory after a delay (kind of slow garbage collection).

      For the “best combination of performance and ease of use”, my vote goes to G-Wan (no configuration and unmatched speed).

  10. Hi nbonvin,

    Facebook’s hiphop for PHP contains an inbuilt web server in it. https://github.com/facebook/hiphop-php
    You could give it a go and benchmark it with the 100 byte html static file in it. This will tell us how good the web server in hip hop really is.
    I haven’t used it, so I can’t offer an tips on how to compile and run it.
    Make sure log files and compression are disabled.

    Cheers,
    Alex.

  11. Running a similar bench with a cgi (such as PHP5) would be very interesting as its not available elsewhere. Running the servers who can use it for course. I’d see something like:
    (with php-fpm where applicable, apc always on)

    lighttpd
    nginx
    apache-prefork
    apache-event
    g-wan
    cherokee

    the reason is because the above is the MOST vastly used configuration EVERYWHERE compared to static file server.

    i think the most used configuration atm is apache-prefork + eventually nginx as front proxy. But it would be interesting to see them all with PHP without front proxy. (all in “optimized” settings)

    • “a similar bench with a cgi (such as PHP5) would be very interesting as its not available elsewhere”

      Plugging PHP behind Apache, Lighttpd or Nginx makes sense: they lack the script engine feature. But plugging PHP behind an aplication server which already supports a script engine, like G-WAN, is a strange request. Woud you ask Glassfish and Tomcat which support Java to use PHP or .Net? Or Zend to support Java?

      No. And nobody would compare Glassfish or Tomcat to Nginx because their authors would say that as application servers do many more things than HTTP servers they cannot be faster than HTTP servers (indeed, Glassfish and Tomcat are much slower than Nginx).

      So, when we see an application server, G-WAN, which is faster than all the web servers, this is really a premiere.

      And when the Java, PHP and .Net script engines have been compared, G-WAN was, again, the fastest server:

      http://trustleap.com/en_loans.html

      The next version of G-WAN will offer an SCGI handler sample. So, even if the idea is odd, people will be able to see what using PHP from G-WAN can deliver.

      • He asked for a benchmark, not an explanation. I want to use WordPress, and I want to see GWAN’s performance. Is it too much to ask for?

        • “Bob”, (Bob’s blog is hidden behind a domain name anonymizer, a common trend for the pointlessly offensive posts on this page)

          As explained in the explanation above that you dislike, PHP lovers will be able to use PHP from G-WAN in its next version.

          So, all you have to do to test PHP is to wait for this version to be shipped. There is no precise date available at the moment.

          If you are exiging a PHP benchmark, then I invite you to consider how much time such a test takes. G-WAN is provided for free. Nicolas’ test was made for free. If your interest goes to PHP, maybe you will sacrify the time needed to satisfy your curiosity by testing PHP.

          Pierre.

  12. the setup should also be usable in practise (e.g. in a CDN)

    Besides serving high loads, a major factor for a CDN is serving quickly. Companies choose a CDN for delivering content fast.
    It would be nice for the comparison if you could also run some “web page load time” tests using the external pingdom full page test service. Not under load, but under no load as a CDN has loads of servers that usually have average loads around 10-20% before extra servers are added.

    In this test the fastest result you can get is interesting.

    PS1 A ReloadEvery add on might be handy for Firefox, to restart the test every 1 minute.
    PS2 As results are highly OS dependent, use the same OS for each web server being tested.
    Some test results for a 12K html file on different Os/Web server combos:
    1. Apache 2.2.17 on 10.5 Xserve: 380msec
    2. Apache 2.2.15 on 10.6 Mini+SSD+1Gbps: 440msec
    3. IIS/6.0: 130 msec, for the same html file, running in a VM on the same hardware as #1
    4. Apache 2.2.3 on CentOS: 70 msec (on a different box)

    • To save time you could run these tests in parallel: 1. every web server configured to run on a different port and 2. starting each new web server test with an in 8-9 seconds interval (for 7 different web servers) in a new tab in the web browser.

      • @Pro Backup,

        > every web server configured to run on a different port

        If you are running all the Web servers in parallel then you are giving an unfair advantage to the first started because it had access to all the free and unfragmented memory it needs while subsequently executed programs will have to deal with the rest.

        And I am not talking of the state of the TCP/IP stack, or the kernel queues. Ideally, each benchmark test should be run on a newly booted machine.

          • Virtualization is another hardware abstraction layer on the top of the OS kernel (which, to avoid more bugs, additional critical security holes and further loss of performances, is the only abstraction layer that we should be running on any given machine).

            So, instead of having the OS kernel as the bottleneck (like on a normal machine), then you have a (much) slower ‘virtual machine’ as the new bottleneck.

            And it is not only slower – it also has a completely different performance profile because everything is encapsulated with new code (for example, memory allocation is well-known to be atrociously damaged by virtulaization, even further than other tasks).

            Unsurprisingly, if the speed is limited to 30km/h, then a car will not ‘run faster’ than a bicycle.

            Beware what you are testing. See this link for more details:
            http://gwan.ch/en_apachebench_httperf.html

    • Interesting ! Thanks for sharing !

      Yes, the response time is an important metric, and I will keep in mind your “web page load time” benchmark. First of all, I need to find the best tools to run such a benchmark. Do you have any suggestion ?

  13. Update 1 (Mar 29, 2011): Logging and compression now disabled for all servers

    .
    Thanks to Google cache the difference is visible. Only ‘nginx’ and ‘apache traffic server’ seem updated.
    Apache traffic server
    min from 32.5K rps to 35K rps
    max from 45K rps to 52.5K rps
    Nginx – 4 worker processes
    min from 30K rps to 35K rps
    max from 50K rps to 75K rps

    • Yes, that’s correct. The other servers were already tested without logging and compression (some servers don’t compress file of 100 bytes).

      • Hi,

        for security I redone all benchmark case on different computer running server
        (single and double core, gentoo and fedora 14) and client running on another computer
        (as I used to do for benchmark involving networking).

        As benchmark case (other than HTTP Keep-Alives yes/no) I choice 3 file for static content:

        99.html ( 99 byte)
        1000.html (1000 byte)
        WebSocketMain.swf (180K byte)

        and the usual dynamic content: Hello {name}

        userver_tcp is the winner in all case for almost all level of concurrency.

        I publish the raw data here (https://github.com/stefanocasazza/ULib/tree/master/doc/benchmark).

        Cheers,
        Stefano

        • Stefano,

          Seriously. Your fastest benchmark (gwan_99_keepalive.csv, 99 bytes, with HTTP keep-alives enabled) is showing that G-WAN hits the wall at a mere 20,553 RPS.

          This is *70 TIMES* slower than Nicolas’ benchmark on a modest i3 laptop.

          Whatever the testing conditions (software, hardware, memory, network) that you have chosen, this is looking to me as yet another “narrow” case which has little to do with the real-life conditions of use of G-WAN.

          You know by experience that all my tests are done in good faith, on relevant machines and software, and this makes it possible for others to duplicate the tests – a condition that I find valuable.

          I just would like to see you return the favor, one day.

          Pierre.

          • Hi,
            the testing conditions are what is available in my office’s work. I don’t know if all your tests are done in good faith, maybe. I wait for others to try for himself.

            Stefano

            • Stefano,

              I still have those emails where John and you recognize really borderline practices in the benchmarking area – after I chased you along dozens of correspondances.

              Insinuating that it was the other way around, or that you just “don’t know” if I am acting in good faith is pathetic – just like your claim that your test machine, the “only machine at the office” is relevant for this G-WAN / ULib comparison:

              https://github.com/stefanocasazza/ULib/blob/master/doc/benchmark/linuxinfo.txt

              That’s a Pentium 4 CPU @ 2.8GHz, a CPU introduced in year 2000 (11 years ago) and replaced in 2005 (6 years ago).

              For the record, the PENTIUM 4 is the LAST SINGLE-CORE INTEL CPU.
              The P4 successor, introduced in 2005, was a dual-Core CPU.

              And you are testing G-WAN, a Web server DESIGNED TO SCALE ON MULTI-CORE SYSTEMS on an old Pentium 4 (the last single-Core in existence).

              A pure coincidence.

              • Please stop with your insinuations, the machine with I work is old but I have used also for the benchmark another machine (dual core with fedora 14)
                Linux giallo 2.6.34.8-68.fc13.x86_64 #1 SMP Thu Feb 17 15:03:58 UTC 2011
                Two AMD Unknown 1000MHz processors, 3990.22 total bogomips, 134217726M RAM
                System library 2.12.2

                The benchmark data on this machine were very similar, I choice to publish the data for my computer because I have for this the plain control whereas for the other I have to share with my colleagues.

                • Stefano,

                  You are using a Pentium 4 to develop and test ULib. That’s perfectly right. There is nothing wrong with that.

                  What is wrong is to:

                  1/ test G-WAN (a parallelised app) on a 11-year old (single-Core) P4
                  2/ do not make it clear it (people can’t guess what is not obvious)
                  3/ claim ULib was “faster than G-WAN” – how much faster? Let’s see:

                  https://github.com/stefanocasazza/ULib/blob/master/doc/benchmark/gwan/gwan_99_keepalive.csv
                  https://github.com/stefanocasazza/ULib/blob/master/doc/benchmark/userver_tcp/userver_tcp_99_keepalive.csv

                  …………… G-WAN ……………………….. …………… ULib
                  ———————————————— —————————————-
                  ……..1….4,954…..4,968…..4,988 …… …….1….5,576….5,629……5,663
                  …..10…22,630…22,829…23,093 …… …..10…22,726…23,002…23,150
                  …..20…22,402…22,508…22,699 …… …..20…22,502…22,651…22,758
                  …..30…21,590…21,701…21,898 …… …..30…21,574…21,727…21,894
                  ………….. // …….. // …….. // …. ……. ………….. // …….. // …….. // ….
                  ..980…14,155…14,205…14,252 ………..980…14,790…14,968…15,054
                  ..990…14,157…14,215…14,270 ………..990…14,914…14,987…15,037
                  1000…14,126…14,206…14,258 …….. 1000…14,886…14,983…15,037

                  G-WAN: min:1689699 … avg:1705856 … max:1723946 … Time:1121 second(s) [00:18:41]
                  ULib: min:1740334 … avg:1756148 … max:1767638 … Time:1121 second(s) [00:18:41]

                  I would not dare to say that such a tiny difference makes one server faster than another (because such a small difference can easily be waved by tuning the code or the options for any given system).

                  To make G-WAN run (really much) faster than ULib on a Pentium 4, I just would have to remove G-WAN’s multi-Core queuing system (the now famous Lorenz-Waterwheel), or use the TCP_DEFER_ACCEPT option that you used to get a boost the last time I checked your code.

                  But G-WAN has been designed for multi-Core systems (we already had this dicussion in year 2009) and G-WAN does not use TCP_DEFER_ACCEPT because it prevents a server from cutting quickly enough established connections that do not send data – making your server vulnerable to the kind of massive DoS attacks that gwan.ch had to go through those two last years.

                  Stefano. Look at the charts published by Nicolas. There is an obvious difference between G-WAN and all the others (all tested on a modern multi-Core CPU and a modern 64-bit OS).

                  This is what allows people to claim that G-WAN is faster.

                  When tested on a multi-Core system ULib is MUCH SLOWER than G-WAN:
                  http://forum.gwan.com/index.php?p=/discussion/239/g-wan-vs.-ulib-stefano-is-back/

                  When tested on a single-Core system ULib is *only slighly faster* than a G-WAN version *configured for multi-Core*.

                  How fair is that?
                  Isn’t it a better and more accurate description of the facts?

                  That’s all the difference between G-WAN and ULib.
                  And this difference has a name: legitimacy.

        • Atmo, (alias “Smith Wolfgang” and “Paula”)

          I can easily recognize you because you make the same grammatical errors.

          In the discussion that hase been closed to prevent you from using alias accounts to trash the site further, I explained in details why you have been banned from forum.gwan.com and I can only regret that you now trash this blog.

          At least, now we know for sure that your intentions are to harm in the first place.

          Like for Alvaro Lopez Ortega (the author of “Cherokee” who followed the same sneaky path) I had to close your account used to spread FUD (Fear, Uncertainty and Doubt) on forum.gwan.com after your favorite Web server (Ulib) performances claim were proven groundless.

          And like for Cherokee I had to close the discussion about your Web server (Ulib) to avoid more garbage postings.

          My only hope at this point is that your name is really Atmo and not Stefano Cassza (the author of Ulib)… who shares with you this same (poor) mastering of the Shakespearean language.

          Pierre (G-WAN’s author)

  14. Pingback: Cheatsheet: 2011 04.01 ~ 04.10 - gOODiDEA.NET

  15. Hi,
    The test looks great. Thanks for your great effort.
    I am wondering factor in the SSL, how would the servers perform (https request). Will memory/cpu usage pattern still be the same?

  16. Hi Zhuqy!

    Varnish does not support SSL – and Poul-Henning explained why in February 2011:

    http://www.varnish-cache.org/docs/trunk/phk/ssl.html

    Tasteful excerpts:

    “You can kiss the highly optimized delivery path in Varnish goodby for SSL”.

    “OpenSSL [...] is 340.722 lines of code, 9 times larger than the Varnish source code, 27 times larger than each of Zlib or JEmalloc.

    This should give you some indication of how insanely complex the canonical implementation of SSL is.

    Second, it is not exactly the best source-code in the world. Even if I have no idea what it does, there are many aspect of it that scares me.”

    “Would I be able to write a better stand-alone SSL proxy process than the many which already exists ?

    Probably not, unless I also write my own SSL implementation library, including support for hardware crypto engines and the works.

    That is not one of the things I dreamt about doing as a kid and if I dream about it now I call it a nightmare.”

    That divides the world in two categories:

    – those who use OpenSSL

    – those who don’t want to use OpenSSL.

    Oh, I forgot those who make their own SSL implementation. But we have still to see the same person write a safe SSL library and a Web server worth benchmarking…

  17. Why did you skip Cherokee it was tested in previous benchmark and was the closes rival to G-WAN?

    • Running these benchmarks takes time, and I’ve decided to limit the candidates to the best open source servers and the overall best one (G-WAN in this case). So, there is no particular reason why I’ve skipped Cherokee. In fact, I was mainly interested in open source proxies, especially NGinx, Varnish and ATS.

      • Cherokee supports both: proxy and reverse proxy.
        One more thing: you tested web server, not proxy performance.

  18. Being biased here (I personally prefer ATS, however run both ATS and nginx in a production environment as a reverse proxy), I must say using a single computer for both the client and the server is the biggest reason for scewed results. The second being only getting a single file with 600 concurrent connections.
    I am sure that the data is off for both Nginx / Varnish / Traffic Server.

    In a real world environment (ie production) my company is able to server well over 30k concurrent connection, pushing over 1Gbps on traffic server, using less than 20% of the CPU (also has a few hundred GB of cache on SSD’s (all files < 10MB). The hardware is a dual Xeon X3430 with 16GB of RAM.

    We were easily able to double what we could get on nginx with double the hardware (however minus some of the advanced features such as gzip from the proxy server).

    • @Billy

      As the authors of each server have participed to this benchmark by tuning each solution to its best capabilities, there is little hope that your company (a simple user of those technologies) could do better.

      The test done here compared very objective points like RPS, CPU and memory usage (without the usual tricks used to boast the commercial PR announces like 10 Gb/s switches and NICs, tuned drivers, tuned kernels, $100,000 server machines, etc.).

      The merit of this test is that:

      – vendors (like Varnish Software AS) carefully avoid to make comparisons.

      – the test environment used here can be duplicated by everybody.

      – every server compared here had to cope with the same environment.

      So, exception made of the blatant advertising of your services, your post does not bring any value.

  19. I wasn’t aware that the authors of ATS had tuned your configs. But if I may suggest, the following settings would probably make your benchmark (under these conditions) better for ATS v2.1.7:

    CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1.0
    CONFIG proxy.config.net.sock_send_buffer_size_in INT 0

    In addition, to make sure things are cached properly (and not getting evicted, I can’t tell what your headers from origin says) I’d add:

    CONFIG proxy.config.cache.ram_cache.size INT 32MB
    CONFIG proxy.config.http.cache.required_headers INT 0

    I don’t know if you can rerun your entire test, but I’d be curios to hear if any of these changes made any improvements at all in your benchmark.

    Cheers,

    — leif

  20. I forgot to say, I still think it’s a bit sketchy to only benchmark for 1s, I’m wondering if what that really is measuring is how fast the server can accept connections. Also note that -t 1 implies -n 50000, so at the most you are doing 50,000 QPS (i.e. not even enough to exhaust 1s worth of traffic for most of these servers, at least not on my cheap quad core AMD).

  21. -t 1 implies -n 50000, so at the most you are doing 50,000 QPS

    “At most”? It looks like you did not read the charts:

    Varnish:        28,000 QPS      (that's less than 50,000 QPS)
    ATS              60,000 QPS      (that's more than 50,000 QPS)
    Nginx:          80,000 QPS      (that's more than 50,000 QPS)
    G-WAN:       140,000 QPS      (that's more than 50,000 QPS)
    
    
    And ATS consumes 65% of the CPU resources to serve 60k QPS while 
    G-WAN consumes 20% of the CPU resources to serve 140k QPS.
    
    If ATS saturated the CPU resources, it would process 35% more QPS (a mere 81,000 QPS).
    
    It seems that this test can teach you a thing or two, finally. BTW, I am not the author of the test, I just read Nicolas' blog carefully (unlike you).
    • Well, I had a typo, I meant 50k requests, not QPS. Which means the test can actually finish in less than 1s. But, read the man page for ab, if you give it the option -t 1 (which the wrapper does, look at the code), it will override his -n 1000000 option with a -n 50000. So, in the g-wan test (assuming it really is doing 140k QPS), it will only run for roughly 1/3 of a second. Such short tests are really not statistically safe IMO.

      • And, replying to myself, the other thing I don’t quite understand is how the “ab” process is not becoming a bottleneck. If I run the ab command as the wrapper uses it, but removes the -t 1 option (so that it actually runs for 1,000,000 requests (and not 50k), ab quickly becomes a CPU hog. On my box, I see this from top:

        PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
        2927 nobody 20 0 515m 92m 4644 R 211.2 2.3 1:05.82 traffic_server
        2947 leif 20 0 97972 35m 2052 R 77.8 0.9 0:11.18 ab

        and that’s running at around 61,000 requests per second. So, assuming ab’s CPU usage is linear, for it to handle 140,000 QPS, it would consume ~1.8 CPU (180%). But, that’s not possible as far as I know, ab is single threaded. Of course, it’s quite possible G-wan is doing something that makes ab use less CPU (fewer headers perhaps?), but not sure that would make such a big difference in CPU usage.

        • In the g-wan test (assuming it really is doing 140k QPS), it will only run for roughly 1/3 of a second. Such short tests are really not statistically safe IMO.

          The question that you should ask yourself is how comes that G-WAN is the only one able to satisfy ab in less than one second.

          The answer is clear: higher speed.

        • I confirm that when running the tests for more than 1s, ab is clearly the bottleneck. Moreover, the greater the value of -t, the worse the issue regarding ports in TIME_WAIT status.

          So, do you think that this micro-benchmark is biaised because of that ? Without using several machines as benchmarking clients, what do you suggest to perform local reliable benchmarks ?

          Thanks for your detailed and helpful comments !

          • @Nicolas,

            I have tested several benchmarking tools and none is as fast as it should be (i.e.: they are all slower than the web servers that they are supposed to put on their knees).

            Rather than trying to find an explanation in an obcure conspiration (amply backed by the obvious unwillingness of the Web server authors to issue any real comparative benchmark), I believe that this is due to the fact that those test tools (like ab) have been written at a time where Web servers were much slower (Apache, IIS 5…). “Performances” and “scalability” had another meaning.

            Today, what is clearly needed is a simple and fast *epoll-based* client. epoll will deliver the needed concurrency without having to resort to multi-threading (leaving enough free CPU cycles for the Web servers and the system to exist).

            The problem is that recent test frameworks were written by people who have no clue about how to write efficient code (and I am not even talking about the Perl or Python scripts that claim to benchmark a C application…).

            This gap remains. And I don’t see many candidates able and willing to fill it.

          • About TIME_WAIT, you can minimize / remove this by letting the kernel reuse ports earlier. In linux, e.g.

            net.ipv4.tcp_tw_reuse = 1
            net.ipv4.tcp_tw_recycle = 1

            I wouldn’t recommend this on production boxes, but it seems fine on lab / test systems (I do it all the time). Also, some kernels insist of having conn-tracking enabled (always), so you might want to make sure you’re not triggering that (not sure if it would over the loopback interface, but it would over the NIC). I posted a little blog about how to disable it when it’s built-in to the kernel (see http://www.ogre.com/node/372).

            As for client performance tools, there really is no good answer, the best I’ve used is a version of http_load that I had to hack up to get performance to be reasonalbe. Using it, I can get around 80,000 requests / second (over the network) from one client. This still wouldn’t be enough to satisfy many of these proxy servers in a networked setup, so what I typically end up doing is running 2-3 instances of http_load. At some point, I should submit up my changes for http_load to ACME…

            In your case, testing over localhost, you might want to try running two instances of ab, and see if that makes any differences. Since you have a nice wrapper over ab already, it might not be too bad to do that, and then merge the result outputs. It’s still unfortunate that the server and load client would compete for the CPU, but it is what it is.

            Finally, for “time”, I’d run for at least 5s I think, or ~500,000 requests. It seems your original idea of running 1,000,000 requests was reasonable, it’s just that the -t 1 option screws with that. Also, it’s annoying that e.g. “-t 5″ would not work, because ab would insist on reducing that to -n 50000 anyways :/.

            For the records, I have no doubt that G-Wan is fast, I’ve never argued about that.

            Cheers,

            — Leif

            • I can get around 80,000 requests / second (over the network) from one client

              One single instance of ab lets G-WAN process 220,000+ requests / second (over 1 Gb LAN) from one client.

              So ab does not look that bad in comparison to your tool… as long as ab is not waiting for the server to reply.

              the g-wan test (assuming it really is doing 140k QPS), it will only run for roughly 1/3 of a second. Such short tests are really not statistically safe IMO.

              That’s why there are ten rounds for each ab shot. Those ten rounds give a range (min, avg, max) and you will notice (if one day you read the results) that this range is similar for all the servers (proving that the statistical relevance of the results is the same whether a server took 10 seconds or 1/3 of a second to process the same load).

              it’s quite possible G-wan is doing something that makes ab use less CPU (fewer headers perhaps?)

              If you checked your assumptions (instead of raising inaccurate issues) you would have seen that G-WAN is issuing *more* HTTP Headers than the other servers compared there.

              Why is it so difficult to accept the facts for what they are?

              – G-WAN’s code uses less CPU than all the other servers.
              – G-WAN’s code processes more Requests per Second.

    • thttpd only runs one process afaik? Doing a quick test, it seems much slower than any of the other servers benchmarked here. Using my epoll enabled http_load over network, my quad core AMD running thttpd does roughly 30,000 QPS. But, it’s only using 1 CPU.

      Using the “ab” tool wrapper as used by all other tests in this benchmark, lighttpd gets about 20k QPS on this box (but, I think my box is slightly faster than the box used above).

  22. Pingback: LinXs 2011-05-04 | Maxim's blog

  23. Hi All,
    Nice benchmark, unfortunately the author of G-WAN doesn’t respect other, he violate the LGPL license by using libtcc but never disclose on their documentation. I feel not very secure dealing with closed source written by people that not respecting the LGPL license. I’ve read ULib also on progress implementing C Servlet container (capability to execute C script directly), its open source and the speed is amazing.

    • Atmo, (alias “Smith Wolfgang” and “Paula”)

      You have posted 5 “replies” in a row on tis blog to spread FUD (Fear, Uncertainty and Doubt) about G-WAN after your account at forum.gwan.com has been closed for the same reason.

      This accusation is false and you know it because I already responded to it on forum.gwan.com several months ago:

      http://forum.gwan.com/index.php?p=/discussion/comment/1037/#Comment_1037

      Therefore, if you post this same erroneous accusation here, this is only to trash the reputation of G-WAN by resorting to plain lies.

      Please stop. What you are doing is called “libel” and unless you are rich enough to buy the “Justice” dept., this is severely punished under the law – whatever the country.

      I would not be surprised to discover that “Atmo” (the Ulib fan disgrunted by G-WAN vs Ulib benchmarks published on forum.gan.com) is in fact the author of Ulib. Alvaro Lopez Ortega (the author of Cherokee) had the same reaction (going as far as posting insults on forum.gwan.com after he censored G-WAN on Wikipedia).

      Nice people, really. I hope for you that you have other joys in life because G-WAN will only improve in speed and features, making the difference even further.

      Pierre.

    • Brian, (alias “Atmo”, “Alex”, “Sneider Hoff”, “Smith Wolfgang”, “Paula”, “Brian” and “Currie” for today’s posts only)

      While Nicolas’s test is about a static 100-byte file, the bug (a timer issue) you are referring to was for dynamic contents.

      Why not invest your time on Ulib (your favorite project) to make it work better?

      Then, you would not feel obliged to trash G-WAN like you did it on G-WAN’s forum before being banned yesterday and on this blog (your 8th post today – and the day is not finished).

      Pierre (G-WAN’s author).

  24. Pierre, you polluting this blog, write the technical argue here, not angry, thanks

  25. Two more Aliases for “Atmo”-the-vandal:

    “Bob”, alias “Alex”, “Brian”, “Atmo”, “Sneider Hoff”, “Smith Wolfgang”, “Paula”, “Brian” and “Currie”.

    Atmo, please stop this stupid FUD campaign against G-WAN. None of your posts make sense and are just showing how unfair and determined to harm you are.

    Pierre.

  26. 1)
    > The servers are optimized, contrary to my previous benchmark, where each server was tested using its default settings.

    2)
    > Except running G-WAN with root priviledges, nothing has to be done to further optimize the performance

    3)
    > G-WAN: 2.1.20 (32 bit)

    Claiming “optimized” and then claiming “nothing has to be done to further optimize the performance”, and yet G-WAN the only one being run as 32 bit. It’s known to be much faster as 64bit, so why intentionally run it on 32bit?

  27. Hello SpiralOfHope,

    Claiming “optimized” and then claiming “nothing has to be done to further optimize the performance”

    I don’t see why optimization “by-design” (without any configuration effort) is a bad thing.

    And in this area, G-WAN is both faster and using less CPU than others so the legitimacy of Nicolas’ “claim” looks geniune to me.

    As Web servers run faster in 64bit, why intentionally run G-WAN in 32bit?

    Running a 64-bit OS makes a world of difference: this is clearly faster, even with 32-bit processes.

    But my recent tests with a 64-bit version of G-WAN (which I will release publicly for the benefit of all), show that, despite more CPU registers (in 64-bit, function calls use registers instead of the stack), it is not clear that G-WAN 64-bit will run much faster (while CPU usage might slightly benefit).

    G-WAN 64-bit’s main advantage will rather be the unlimited access to RAM, which makes sense with the to-be-released ultra-fast G-WAN Key-Value Store.

    Under Windows and Linux, a 32-bit process works on a 32-bit and 64-bit OS.
    This is not the case of a 64-bit process which requires a 64-bit OS.
    That’s why I started with a 32-bit version of G-WAN (and Nicolas did not have the choice for his test as I did not release a 64-bit G-WAN so far).

    To address your question further, my recent efforts in the memory consumption area (Nginx is the clear reference in Nicolas’ test and I wanted at least to match this level of efficiency) have had a more significant impact on G-WAN’s performances (better locality, less syscalls) than a 64-bit recompilation.

    Of course, this is not a definitive position because I only focussed recently on the potential of the 64-bit architecture. Further experimentations in this specific area might help G-WAN to make progress in the future.

    Pierre.

  28. The main problem I have with this blogpost is it’s title: “Serving small static files: which server to use ?”.
    Reading the title one could be mislead to believe that the results of this highly limited benchmark should help you choose between these servers. I think the title should be something like “Synthetic comparison of serving of one file from localhost with various servers.”

    In the real world very few people serve the same file to thousands of localhost clients.. And just hinting that this microbenchmark has anything to do with real world performance doesn’t reflect too well on the author.

    As such the microbenchmark does produce a couple of pretty graphs, but I would be very very careful before using it as any sort of foundation for choosing between the servers present in the test.

    • Dear Denis,

      Wow, the Nordic FUD-machine in action (again). Disclaimer: the worst performing of all in this test is… Varnish – a product marketed by… “Linpro AS” a company from Oslo, Norway… just like Denis (http://denis.no/ in… Norway).

      Instead of criticising the title of Nicolas’ comparative benchmark, why not do your own test with the criterias that you feel relevant (several small files instead of one)?

      I invited Varnish to do this when they denied this test’s value (despite having heavily participed to writing the configuration file used here, and requested several tests in a row with different options)… but they never ever followed-up (prefering to spread FUD in RSS feeds about G-WAN).

      Like for “Varnish AS”, my feeling reading your post is that its unfair tone “doesn’t reflect too well on the author”.

      Pierre. (G-WAN’s author)

      • Pierre,
        Do you contest my point that this microbenchmark is mostly irrelevant for choosing a lightweight webserver or not? I’m a bit uncertain, as your comment is mostly an attack on Varnish / me, not an argument against my point.

  29. Denis,

    As long as the only serious critic you bring is about the TITLE of this benchmark forgive me to consider your post as mere pollution.

    If you want to get my recognition (and the one of the busy guys out there trying to make better software) then restrain from using FUD and start contributing with USEFUL data.

    I praised Nicolas’ initiative because it was VERY INFORMATIVE for all – including for me: I would have never imagined that a server could, like Nginx does it, so brutally dominate the field in terms of memory usage.

    Thanks to Nicolas’ test, I took action and my new development version of G-WAN is now using LESS MEMORY than Nginx (in addition to be faster and using less CPU resources).

    Denis, let’s say that you are honest. In that case, you will follow my invitation to make a new “more relevant” benchmark comparing the top performers listed here (and others if you wish).

    And if your test has a nice TITLE then I will also congratulate you about it.

    As long as you decline this fair offer to deliver something of value, your comments are just relying on groundless claims (the definition of FUD)… making me consider that your behavior “doesn’t reflect too well on its author”.

    Pierre.

  30. Pierre,

    Thanks for all the work you guys put in!

    We are currently running a Php/MySql stack using the Zend Framework/Server on Linux/Apache/RHEL. We have been having some major issues recently with pages being served up very slowly and mostly it seems that our js, css and images look like the culprit of holding up the pages. We are thinning down and consolidating files right now and looking for a direction to move for a good cache strategy for these static files that seem to be causing the largest issues on our site. The guys at rackspace recommended taking a look at Varnish, but I came across a lot of benchmarks, like the one here, that point to G-WAN being a better direction possibly. One of the posts says that your next build will be better for Php sites. Does that mean that it currently won’t help? Would I be better off using something else for now until the next release?

    Thanks,
    Pete

    • Pete,

      This blog is Nicolas’ blog, not mine. You should rather post these questions to:
      http://forum.gwan.com

      Regarding your specific needs, on the top of being even faster, more scalable and now using LESS MEMORY than Nginx, the new version of G-WAN will provide:

      – a much faster cache (the feature you are looking for);

      – SCGI and Reverse-Proxy Handler samples (to connect with PHP, etc.);

      – the fastest (“wait-free”: non-blocking and never delaying) Key-Value store on Earth (each [ASCII or binary] key and value can be up to 4 GB in length).

      So, either you have the skills to connect PHP and G-WAN today – and you can start using today’s version of G-WAN or you will wait a bit more to get your hands on the most efficient way to boost any Web server, ever.

      Side note: the design of your site is pretty good, congratulations!

      Pierre.

  31. Pingback: Beracah’s Blog: {0…t-1, t, t+1…∞} » Tweaking Scooty Puff Sr., The Doombringer to 83,000 requests per second

    • YouOnThisBlog,

      As you obviously skipped (or purposedly ignored) my previous discussions about the relevance of localhost tests, your rant is already qualifying you as a Troll.

      For the lazy ones (unable to search this same page), here it is again:
      ——————————————————————————————————-
      G-WAN is faster than Nginx only because its *request parsing* and *response building* code is much faster than Nginx’s.

      And this is precisely what a tiny file benchmark on a modest laptop on localhost illustrates.

      There is no strong CPU to hide user-mode code inefficiency, nor high-performances NICs and switches to make network latency/bandwidth magically disappear in a specific benchmark made for PR headlines. With localhost, the only bottlenecks are the OS kernel and the Web server user-mode code (40/60 for the slow servers, 90/10 or less for G-WAN, see by yourself).

      Everybody can duplicate such a test on localhost (without tenths of thousands of US dollars AND lengthly NICs tuning explanations) – and this is precisely the value of this comparison.
      ——————————————————————————————————-
      The next time you try to ashame a programmer, make sure that you have a (freferably valid) technical argument.

      Pierre.

  32. Pingback: Serving small static files: which server to use ? | Brent Sordyl's blog

  33. Hello,
    thanks for your great article.
    Unicorn could be a good candidate too (coupled with NGinx optionaly).
    Regards,
    Luc

  34. Nicolas, thanks for your research efforts! Very interesting read. Am about to decide on a reverse proxy for our hosting platform (35M hits/day, 250 servers). Was originally opting for Varnish but will look into G-WAN now. Pierre, I would humbly advice you to stay clear of the heated flame wars such as on this and your own forum. Not taking the bait gives a more professional impression to casual readers such as me.

    • Willem,

      Not taking the bait gives a more professional impression

      I am not sure that a one-way side of the story is desirable – especially when it is exclusively made of lies.

      My efforts, that you consider as ‘not professional’, find their justification in the value of giving users access to technical arguments while the trolls use (groundless) statements designed to exploit people’s ignorance.

      By enlarging the number of those in a position to understand the facts without assistance, we defeat the very purpose of the trolls (something that I also find desirable).

      Pierre.

    • 35M hits / 250 servers = 140,000 that is less than 2 requests per second; this is little traffic per server and all ones in the tests should serve you well unless you want to reduce number of server machines by using better web server.

      • @Salam: it also depends on the type of ‘hits’. If that’s heavy PHP processing or DB querying then one server will not suffice.

        Also, the 140k RPS number is for localhost. On a gigabit LAN, it would be roughly the double.

  35. Pingback: Sysadmin Sunday 34 « Boxed Ice Blog

    • > The Apache Foundation claims ATS delivers “more than of 200,000 requests per second”

      These claims are made without any details LAN / localhost, type of machine(s) (laptop, desktop, server), number of CPUs, number of NICs, number of clients, with which client program, etc.

      – The test made on this page was on localhost with a modest i3 laptop.

      – On a gigabit LAN, you can get almost twice as much Requests Per Second.

      As you can see 200,000 RPS is far from reaching the top contentders.

      This becoming ridiculous to see all the servers, mainstream or not, making their PR announces on this blog page, really.

  36. Obviously, no one of the tested servers used 100% of CPU resource to serve more requests per second; why? what is the bottleneck? network port bandwidth is not the bottleneck since the test run on same machine neither the RAM since the was plenty of it unused. disk IO also should not be the bottleneck since the file must be cached in RAM by OS when get requested that much times. then, again, what is the bottleneck that prevent tested servers, especially G-WAN that make best use of CPU, from using more of the CPU to serve more request per second?

    • > what is the bottleneck

      On his site, the author of G-WAN says that this is the Operating System kernel.
      Kernels use locks and are notoriously lagging behind the multi-Core trend.

      And this is would be why G-WAN Windows is much slower than G-WAN Linux:
      (the Windows kernel is much slower than the Linux kernel – despite the bloat)

      http://gwan.com/en_linux.html

      It means that kernels will have to improve in the future to make progress on new CPUs.
      It also means that only the Web servers able to tap into parallelism will make progress (as long as the kernel is the bottleneck for a given Web server program then this program is optimal).

  37. G-WAN’s claim its transparently intercepting all blocking IO call become non-blocking call, How to do that ?
    – I dont think GWAN do LD_CONFIG trick, because its unable to intercept function call inside the shared library.
    – I dont think GWAN do syscall interception, because its using ptrace that need at least two process, and GWAN definitely is single process.

    So how to do that ? Lets analize on how to handle the C servlet. Use TCC library to compile the C servlet, injecting some function via tcc_add_symbol(), get the main() function via tcc_get_symbol(), and call the main() function to run it.

    For example, if you wanna intercepting “recv” call, it seems we just need to do tcc_add_symbol(s, “recv”, my_own_recv), but its never worked !!!
    Any other possibility ? Maybe, if we FORCED the TCC to read the shared library into memory, because the default behaviour of tcc is NOT reading the shared library but just do simple dlopen() if the compiling destination is on memory. By fording to read the shared library, perhaps the dynamic symbol of ELF object can be altered.
    What i mean by the “FORCED” is modifying TCC to do “tcc_load_dl” instead “dlopen”, then carefully read the symbol table to do interception. This technique perhaps failed on newer libc that bypass the system call stubs for many of its internal routines.
    Any other possibility ? Maybe some Kernel/ELF guru will give other tricks. But if I modifying the TCC (or copy-paste some of the code) and statically linked with my program, do i have to released my source code ?

    • Atmo,

      May I suggest you to post G-WAN development-related question to… http://forum.gwan.com instead on this general-purpose blog which belongs to Nicolas?

      People don’t ask questions about Nginx’s development here. So why do you do it?

      Answer:

      “Atmo” got a new alias “John” (in addition to “Bob”, alias “Alex”, “Brian”, “Atmo”, “Sneider Hoff”, “Smith Wolfgang”, “Paula”, “Brian” and “Currie”). He always makes the same typos and grammatical errors that let us stop him kms away.

      Pursuing his FUD campaign against G-WAN, Atmo is also clearly engaged into reverse-engineering G-WAN to make something similar (good luck)… but he clearly lacks the programming skills to make any significant progress in this matter – hence his obvious frustration, false accusations, etc.

      Atmo, rather than posting your poison here again and again, learn system architectures and programming. Your life will be easier then and you will no longer have to hate others.

      Tip: G-WAN’s non-blocking (async.) client BSD socket calls also work transparently for third-party libraries (like the DNS system library) – proof that the TCC library does not do it nor it requires any modification. This has the advantage of being portable, unlike your non-sense constructions-that-do-not-work.

      Please, for the last time, spare this blog from your venimous comments.

      Pierre.

      • Hi Pierre,

        I understand you are the author of G-WAN. I understand that you would like to defend it, but please do not be so hostile while you defend it. It appears as a shouting match in the above comments, except with letters instead of sounds. Critics will always be there, but you must maintain composure.

        I am curious why G-WAN isn’t open source though. Perhaps the BSD/Apache/GPL licence may suit it well.

Comments are closed.