Nginx is Central's new friend


October 29, 2008 By Brian Fox

The load on Central has been increasing steadily over the past few months. Coincidence or not, it started shortly after Maven: The Definitive Guide was translated into Chinese. First, httpd was running out of connections, then as we adjusted those higher, the load on the system was getting out of hand. Most recently we were getting 15 minute load averages above 200 on a regular basis. To solve this and provide more redundancy, we started the process of installing a load balancer to share the load with a second machine. On a tip from James, we looked into Nginx “engine x”, which is supposed to be high performance http server. The central repository is currently not doing anything dynamic, the files are just sitting there on disk, updated occasionally via rsync. This means we are not leveraging the flexibility of httpd mods and the overhead was killing us.

To check it out, we installed Nginx in parallel to httpd and once everything looked good, swapped the ports to put it into production. How did it work out? Simply amazing. Since the swap, the loads are averging around .30. This is a quad core machine, so that means that it went from being 28x overloaded to using a third of ONE cpu. Being suspicious, I checked the bandwidth logs and response times. The average response time is 8x faster (no doubt because there’s 3 cpus sipping Mai Tais) and the total throughput shows no discernable change. So we’re now serving the same bandwidth that was swamping the cpus under httpd with next to no cpu with Nginx.

During the ongoing process of tracking the load on Central, we found a surprising number of large organizations out there with a very high connection usage. Please, if you’re using Maven in an enterprise, do us both a favor and install a repository manager such as Nexus. The entire community will benefit because the Central repo will be able to serve the needed artifacts in a timely fashion. You will benefit because it will give you centralized control over your artifacts, less external bandwidth use, less dependency on an external connection, indexing, searching and storage of your internal artifacts, and a ton of other functionality.