Today's Maven users have two solid choices when it comes to repository managers: Sonatype's Nexus and JFrog's Artifactory. While we are convinced that Nexus is the better choice, we're especially happy to see that there is competition in this market. Competition leads to more efficient markets for software and more "accountability" to the end-user. Nexus and Artifactory have much more in common than not, but we think the differences are important to understand as they have dramatic impacts on performance and scalability. In this post, I contrast some of the design decisions made in the construction of Artifactory with the design decisions we made when developing Nexus.
Contrast #1: Network: WebDAV vs. REST
The first major difference is that Artifactory uses Jackrabbit as a WebDAV implementation for artifact uploads. Nexus implements a simple, lightweight HTTP PUT via Restlet instead. We had a WebDAV implementation in early Alpha releases but found it to be far too heavy and slow. Switching to a simple REST call improved our performance and significantly decreased the memory footprint. A profile of the memory use in Artifactory that I ran on previous versions showed that the majority of the object creation and memory allocation is related to Jackrabbit. Sure, you can't mount a Nexus repository with a webfolder using WebDAV, but is that really what you need a repository manager to do, or would you rather it be blazingly fast doing Maven builds? It is possible to use the lightweight wagon against Artifactory (http vs dav:http), but the choice of Jackrabbit in our opinion is overhead that isn't needed.
Contrast #2: Storage: Relational Database vs. Filesystem
The second major difference is that Nexus deliberately chooses to use a regular Maven 2 repository layout to store the data on disk. Doing this effectively isn't always easy and we've had many discussions about it amongst the team, but I hold fast to this approach for several reasons:
- It makes importing and exporting the repositories a no brainer. Simply copy the data into the correct folder in the Nexus work folder and you're done, import finished. Copy it out, export done.
- The incremental nature of the file changes in a Maven 2 repository layout makes it extremely well suited for incremental backups to tape or other archiving medium.
- Nexus also keeps its metadata (not to be confused with the maven-metadata) separate from the artifacts, and the data is rebuilt on the fly if it's missing. If you are unlucky and have some hardware or disk error, you will likely only get one file corrupted, not the entire repository.
- Having the metadata separate means Nexus upgrades don't have to touch any data in the repository folder. Upgrades and rollbacks of the system can happen as fast as you can stop one instance and start the next.
Artifactory takes the polar opposite approach and stores the metadata and the artifacts themselves in a huge database. The reason they claim it's needed is for transactional behavior. Using a database doesn't guarantee transactionality and it certainly isn't the only way to get transactional behavior.
In order to use a database, Artifactory needs to have import and export tools. The imports and exports of this data are reported to take a significant amount of time (http://issues.jfrog.org/jira/browse/RTFACT-317). Some upgrades require a full dump and re-import of the database, taking out large systems for a significant amount of time. Also, what happens if you need to tweak or repair a file in the system? Break out your dba books and go to town. How about incremental backups? Would you be happy if a single disk error made your entire repository garbage?
We feel strongly that introducing a repository manager into your system shouldn't require a dba to manage the data. Quite reliable backups can be performed with Nexus using robocopy or rsync tools and a simple script, and transactions can be obtained with much less overhead. In fact, with the Staging plugin, Nexus is able to turn an entire multimodule build into a single transaction. There are ways to implement "transactional" interactions in a piece of software without having to throw *everything* into a database. We think that loading the entire contents of a repository into Jackrabbit and modelling the repository in a relational database is introducing much more complexity into the problem than is necessary.
Contrast #3: Storage Size
It has also been reported that the indexes and metadata introduced by Artifactory can double or triple the size of a repo. See this thread for real examples. Perhaps that's manageable on a 1gb repo, but how about something like Central at 60+ gb? Nexus uses the Nexus-indexer (Artifactory uses it as well to provide search capability) which is just a Lucene index. We can provide indexes of Central that are only 30mb...not double the size of the repo itself. Note that the Nexus indexes also include cross references of the Java classes contained in the jars. Once again, we think involving a relational database into this problem is an unnecessarily complicating factor.
A preliminary test import of a 116mb release repo took 5 minutes and the resulting data size was 323mb (2.78x the original size). Extrapolating that out to a 4gb repository gets you about 3 hours of import and a total data size of 11gb. Sure disks are cheap these days, but still tripling the size of your data has many long term ramifications when you consider backups, replication etc.
Note: The actual import I ran failed 3 times on data from central due to too strict checking, I had to prune or repair the files just to get it to import. Fortunately that didn't happen midway through a 3 hour import.... which leads me to the next point...
Contrast #4: Nexus Doesn't Interfere
We believe that Nexus shouldn't interfere with your builds. We all know that the data in remote repos like Central can be incomplete. However, if Maven is able to use it, then we make sure Nexus won't get in the way. Artifactory proactively blocks any data that isn't parse-able as it comes through as a feature. This means you may have a build that works without Artifactory and breaks with it because it refuses to proxy (and apparently import) any files it doesn't like. Nexus will report that there's a problem to the admins to deal with, but won't cause Maven to blow up for the developer. Nexus favors stability over correctness for proxy repositories.
Download Nexus Today
Nexus is available as an open source project for free. There is also a Pro version that includes additional commercial level functionality and professional support and for only $2995 per server. While the Open Source version is capable and popular, Nexus Professional adds some new features that are targeted at Enterprise Users: staging, procurement, and LDAP integration.