Nexus: Improving Maven Central and Supporting the Maven Ecosystem

January 12, 2010 By Jason van Zyl

8 minute read time

Nexus is more than just a repository manager. It is a project that has been developed using the same underlying infrastructure of Maven, and it has forced us to think about the different ways in which the components that comprise Maven can be integrated with other, more complex systems. It is a critical step toward a more mature Maven ecosystem which starts to encompass much more than just software builds. You can think of Nexus as the second major project to emerge from the Maven ecosystem - an ecosystem which includes both commercial interests as well as open source volunteers and community participants.

Sonatype is focused on improving the foundational infrastructure which will allow us to improve the quality of artifacts and their accompanying metadata in Maven Central and Maven repositories around the world. A lot of this is not especially glamorous work and though many people complain about the state of some of the Maven repositories, very few take action. Here are some of the things Sonatype is doing with Nexus to improve the state of the Maven ecosystem and expand its scope.

Improving the Quality of Public Maven Repositories

Any effort to improve the quality of the Central Maven repository needs to begin with the major feeder repositories. For years, we have been giving rsync access to many of the organizations with large feeder repositories like Apache and Codehaus. When we started this effort, we were optimistic that these organizations would take care of their Maven repositories. We thought that repository maintainers and projects would make sure that all artifacts were signed and that all POMs contained a bare minimum of useful elements such as "license" and "description". With hundreds of projects pushing artifacts into their respective repositories on a daily basis, it has become obvious that without some mechanism to guarantee quality, without a well-defined process, the Central Maven repository will contain artifacts and metadata of questionable status. While the vast majority of artifacts have appropriate PGP signatures and metadata, the fact that a minority of (often very popular) artifacts lack proper dependency definitions or license elements means that we see a steady stream of complaints about the quality of the repository from our users.

These problems can be anything from one project trying to publish another project's artifacts because they weren't in Maven Central, incorrect, bad, or missing metadata, missing javadocs or source JARs, to invalid transitive closures. Whatever the problems are, the overall issue stems from the lack of process surrounding how artifacts are published to our feeder repositories, and how these artifacts are then certified to be published to the Central Maven repository.

Sonatype's answer to this problem has been to provide tools that can automate the process of repository maintenance for Central's largest customers - the main feeder repositories. We've provided a solution to the largest of the feeder repositories which allows them to use the Nexus Staging Suite to validate that all release artifacts contain the bare minimum of metadata, javadoc, and a valid PGP signature. As Ate Douma wrote in Monday's post about using the Nexus Staging suite to support the Apache Portals project, we've supplied a solution that reduces the amount of error-prone, manual work associated with software releases, and we're going to continue to find ways to address the needs of large, open-source enterprises with Nexus by providing Open Source projects and organizations with a free license and free support. Here is a list of some of the organizations with large feeder repositories that we are supporting directly:

  • The Apache Software Foundation
  • Codehaus
  • Alfresco
  • ExoPlatform
  • Glassfish
  • Open QA
  • Scala-Tools

With all of these organizations and projects using the Nexus Staging Suite, we are confident that the quality and reliability of artifacts and metadata will increase over time. The PGP signatures provide the security assurances organizations need, and the sources, javadocs and correct POM information like SCM information make for a better developer experience in the IDE. In M2Eclipse, for example, javadocs and sources can be dynamically retrieved as required and binary dependencies can be materialized to source from from SCM coordinates. This makes contributing patches to an dependent project an order of magnitude easier.

We are also fortunate to be working closely with Atlassian. We have Atlassian Crowd support in Nexus, so Nexus is an ideal fit for Atlassian and for organizations that make use of Atlassian's compelling products. You can find Atlassian's Nexus instance here. It's just another validation point for us that Atlassian sees fit to use our products as part of their daily development. We have a lot of respect for the Atlassian folks, and we've standardized our entire development environment on tools like JIRA , Greenhopper, and Confluence.

Decreasing the Time to Reach Maven Central

Many users and projects have complained that it can take a while to get their artifacts in Maven Central so we're starting to focus on reducing the time to reach Central. The biggest obstacle to automating the process of publishing artifacts to Central is one of quality. Artifacts would reach central faster if there was a better way to enforce a minimal set of quality standards. The legacy process involved projects uploading "bundles" to a JIRA project and repository maintainers manually inspecting the bundles to see if they complied with a set of standards.

In the previous section, I described how Nexus is already powering the major feeder repositories for the Central Maven repository. What we offer projects is described here. Basically we will help you cleanup, and migrate your project's Maven repository to our hosted infrastructure and help you setup your project POMs to use Nexus' Staging. Projects are setup with the default staging rules which ensure the presence of PGP signatures, javadocs, sources, and a POM which contains decent information. We provide all the instructions for the setup on the Maven side so there is little work you have to do in order to take advantage of this service. We are also providing a mechanism by which the standard bundle uploads, typically done via JIRA, can be handled by Nexus. Internally within Nexus the upload bundle is exploded and placed in a staging repository, as would happen if you performed the release against Nexus from your Maven build. We then apply the same staging rules to ensure quality.

We have already started rewarding projects with good Maven releases by automating their synchronization with Maven Central. Over time we are going to start enforcing standards for security and quality of metadata. If you have proper project metadata, PGP signatures, javadocs and sources your artifacts will fly into Central as quickly as possible. Projects that submit poor Maven releases are going to have a more difficult time getting artifacts into Maven Central. We'll start to enforce these standards gradually and we'll give the community time to adapt. Any tool that creates bundles for deployment to Central is capable of producing these artifacts. The staging rules don't care how the releases were constructed. So use whatever tools you like, you'll just have to pass a minimal set of requirements to make it into the Central repository. Over time this should greatly improve the quality of Maven Central.

Providing Metadata about Repositories

For a long time we have been providing a way, through Nexus, and the stand-alone Nexus Indexer, to produce the Maven repository index. The repository index contains information about all the artifacts in the given repository including class file information and project identifiers. It is primarily used as part of IDE integration to help developers find dependencies based on artifact coordinates or class references, like import statements. Using m2eclipse, all you need to do to add a new dependency is type the name of a class into your code and search the index for matching dependencies. You can also search the repository index for all of the available Maven archetypes when you are creating a new Eclipse project. Both of these IDE use cases are possible because of the standard repository index format that was defined as part of the Nexus project.

The Nexus index format is also storing information about the presence of PGP signatures, javadocs, sources, and checksums. Using Nexus and other tools that can read the Nexus index format, you can aggregate multiple repository indexes together and perform quick searches across multiple public repositories (as well as your own hosted repositories). The Nexus indices from the OSS repositories around the world are proving to be a critical resource, we can tell because it is the most requested item from Maven Central. Index downloads amounted to 28TB of transfer last month.

Polyglot Component Repositories

The explosion of language choices has transformed the way most developers approach problem solving. Just four years ago it would have been normal to walk into a corporation and see Java at all levels of the stack. In 2010, most businesses have started to incorporate multiple languages and hybrid architectures into enterprise systems. A system designed today might use Ruby on Rails (running under JRuby) to power a web site which interacts with a set of services coded in Java. Services like Twitter rely on a foundation of Scala code executing on the JVM while other portions of the Twitter architecture use different languages and different technologies where they are most appropriate.

We are a polyglot industry, and while our software enterprises are run on a mixture of different languages, our development tools and build technologies are often locked into a single language or a single technology. A Ruby application is built using Rake, a Scala application is built using Sake or the Maven Scala plugin, and Java or OSGi applications are built using Maven. A Ruby library might generate and consume gems while a Java application might generate and consume JAR files. There is currently no single, consolidated "Tower of Babel" to help developers translate between different types of software artifacts. We've put a lot of effort into making the foundation of Nexus as agnostic as possible about what it is storing, and, because of this, we're moving to add support for even more artifact types.

Nexus currently supports the two OSGi formats of P2 and OBR and we are just finishing our first draft of RubyGems support. Polyglot Maven will drive us toward a Polyglot Nexus. As part of the work we're doing on Polyglot Maven we may find that different scripting language implementations have slightly different requirements for dependency management or provisioning runtimes, and we'll be ready for that.

Where do we go from here?

Next we're thinking about ways to make statistics for a given project's artifacts available to the project's developers. We have already implemented user signup in Nexus and we are currently working on project signup as well. What this means is that projects can register with a given groupId, or set of groupIds, and optionally be provisioned a repository which can be operated by a set of users. Once a project registers we will know what slice, or slices, of the statistics they need to see. Our initial thought is that project statistics, number of downloads should only be made available to the public with the permission of each individual project. Brian and I along with Greg Luck and Dain Sundstrom have been working on a simple statistics mechanism that we hopefully can provide to projects early this year.

Tags: repository, Nexus Repo Reel, polyglot, codehaus, Everything Open Source, Central, Community, Open Source, apache

Written by Jason van Zyl

Jason is a co-founder and the former CTO of Sonatype.