Can Nexus Scale?


April 20, 2012 By Tim O'Brien

We’re often asked by customers to prove that Nexus can scale to meet the demands of thousands, and sometimes tens of thousands, of developers. Fortunately, we don’t have to stand up an expensive set of machines for a proof-of-concept as we have the world’s largest collection of active open source projects hosted on a single instance of Nexus Professional running at http://oss.sonatype.org. This instance isn’t just proof that Nexus Professional can scale, it serves as a public instance that you can model your own instance after.

If you are looking for an estimate of the hardware required to support your instance of Nexus, this post will detail the configuration and specifications of the Nexus OSS repository instance. This instance is the largest known deployment of a repository manager in active use.

Performance of Nexus OSSRH

Nexus OSSRH serves requests on the order of 1,400-2,500 requests per minute. What drives this level of activity? First, the instance serves as a snapshot repository for many open source projects. If you look at the list of projects hosted on OSSRH, it is a large list. As we examine the logs for oss.sonatype.org we regularly see thousands of unique IP address every day, and oss.sonatype.org is involved in a number of OSS project’s CI builds. This means that at any given time, OSSRH is supporting any number of simultaneous CI builds and over the course of a given day we’re serving artifacts to thousands of developers.

OSSRH approximates the performance characteristics required for the largest development efforts in the world: with multiple geographic locations, 24/7 uptime requirements, and very high performance standards. This service has to stay up. If OSSRH were to become unavailable, you would hear an immediate outcry from every affected OSS developer. Just choose a day and search for projects announcing that they’ve pushed artifacts to oss.sonatype.org on Twitter and you’ll see that every day has several critical releases.

When a customer asks us to prove that Nexus Professional scales, we don’t have to stop and setup a contrived performance test. We support this level of activity every single day. All we need to do is point them at OSSRH.

Nexus OSSRH Specifications

We’ve established that OSSRH is at the center of a large amount of active OSS development. It serves between 1400 and 2500 requests per minute, and it is a mission critical resource. It would be reasonable to expect that this service runs on a cluster of machines distributed throughout the world to minimize latency. Think again, this is a single VM with modest specifications running at Contegix and constantly monitored by New Relic.

Our standard setup for all managed forges is:

  • 2 CPUs
  • 3GB RAM
  • 400GB disk (this is completely dependent on your repository contents)
  • RHEL 5.6 x64 (Contegix, our managed hosting service, recommends using this OS)
  • Java 1.6 x64 with 1GB Heap* (see correction below)
  • The virtual disk is located on a SAN connected with iSCSI over 1GBE

If you are supporting a global-scale network of thousands of developers, the hardware cost for this Nexus instance is a “drop in the bucket”. The specifications for one instance of Nexus Professional running on a service like Amazon EC2 would easily fit on an m1.large instance with space to grow or a very modest VM. (The only thing you might spend on is the disk requirement. For OSSRH, we have a six-disk RAID 50 approach described below.)

Scaling Nexus: I/O Requirements, Network, and Disk

Under heavy load, increasing the number of CPUs and amount of RAM may help, but often the gating factor is either disk I/O or network. We do not recommend using NFS to mount a virtual disk for the working folder as many customers have had trouble with locking and corrupted indexes. iSCSI is working very well for us on oss.sonatype.org and it also works for many of our flagship customers.

Over the course of a day, the system typically needs to scale up in terms of network and IO. And, Nexus “sings” under heavy load because we have made numerous code-level optimizations to ensure that we’re making effective use of caching to reduce roundtrips to disk. For I/O performance, we recommend a redundant solution that maximizes disk spindles, while maintaining fault tolerance. We use RAID 50 in our SAN. A RAID 50 combines the straight block-level striping of RAID 0 with the distributed parity of RAID 5. It is a RAID 0 array striped across RAID 5 elements. This approach emphasizes both performance and extreme reliability, it requires at least 6 drives.

If you need scale, Try Nexus Pro

Sonatype designed Nexus to meet the demands of the OSS community from the beginning. We’ve been supporting global-scale OSS communities for years, and we’ve integrated the lessons learned from supporting active OSS development into Nexus Professional. If you need to scale, try Nexus Professional Today.

Correction from Mike Hansen: With 2.0 we upped that to 2GB, at least on OSSRH.  But that pretty much just provides some extra headroom…  Actually, IIRC, the reason we went to 2GB was because we were battling memory consumption with some repository indexes that had not been optimized (i.e. the index optimization task had not been run for a very long time).