A Simple Reminder for Maven/Gradle/Ivy Users: Proxy Central

January 31, 2012 By Tim O'Brien

Over the course of the past few years, I’ve interacted with hundreds of people when talking about build tools and repository management.   It continues to surprise me how many people don’t realize where these artifacts come from.   When you run a build and these JARs just show up alongside all of their dependencies, it’s like magic to most people.     If you know how it works, it’s very obvious to you that running a repository manager is the right thing to do.  This post is a reminder to everyone using build tools that rely on Central: take time to proxy Central with a repository manager.

“Wait, that’s how Central works?”

There’s something so automatic about dependency management in Maven that it often takes people a few months to understand exactly where those JAR files are coming from.
In an 8 hour Maven class, I get to dependencies in the third hour, and after describing Central, what it is was like before Central, how metadata is stored in a repository alongside binaries, transitive dependencies, etc…. it all falls into place, and people realize that this simple thing they’ve grown accustomed to is only easy because of a ten year effort to refine the model, the creation of a support structure for source forges at places like Oracle and Google, and a constant investment in infrastructure.

On the one hand, it’s a great success that Central is, for the most part, an invisible utility that supports developers.  On the other hand, it’s the kind of thing that people can start to take for granted very easily.

For example, a few months ago I spoke to someone who worked in an environment disconnected from the internet for security reasons.   This individual was talking about how limiting it was to have to download JARs from open source projects manually and assemble them in a project.   His words were: “It’s like programming Java in 2001 again.”

How can you help?

Imagine millions of developer spread all over the world: different time zones, different applications, but they all hit the same service: Central.    Some regions have more developers than others so we certainly see peaks in usage throughout the day, but in general, Central’s serving thousands of files throughout the world at any given time during the day.

Maybe someone just installed Maven for the first time, or maybe they blew away a local repository, with numbers like these we see a world that has a constant appetite for artifacts.   It isn’t a problem for Central, and I’m not writing this because Central is falling down on the job. Central can handle it, but it certainly isn’t the most efficient way to support millions developers.  It isn’t a good use of network bandwidth, and it isn’t a good use of energy to constantly cart around the same static JARs over and over and over again when the solution is so easy.

If everyone who used a build tool that interacted with Central adopted a repository manager such as Nexus we’d have a faster, more responsive system.   Central’s maintainers would be focused less on addressing the occasional runaway build and could spend more time and resource on increasing availability and functionality of this essential service.

Broken Builds

The other factor playing into this is that Maven builds only download releases once.   It isn’t like these build tools are repeatedly returning to Central to download release artifacts over and over again.

Well… actually… that isn’t true, we’ve seen some installations of Hudson configured to delete a local repository before every build placing a high load on Central.   Imagine a build that downloads 50 MB of dependencies running once every 5 minutes.  That’s one build consuming ~14 GB a day never mind the time wasted downloading static artifacts.
While these broken builds are the exception, they do still show up from time to time.  Central can handle the load, but imagine 1000 of these broken builds running continuously and you can see the challenge.

A Simple Reminder: Please Proxy Central

We’re constantly watching the performance of the system and making sure it stays up and running for an entire world of developers.  If you use a build tool that hits Central whether it is Buildr or Maven or Gradle or Ivy, you can help us by running a Nexus instance.

Even if all of your builds work perfect, running a local Nexus instance helps preserve Central as a public, free resource and it will lead to faster, more responsive builds.

  • Jon

    Interesting post.

    We have specified our Nexus server as the only repository in our pom file (for the reasons specified in your post).  This works for all *but* requests to Central, which continue to use repo1 even though it isn’t specified anywhere.

    I assumed this was default behavior although it seems I’m missing a trick.  Any pointers?


    • sonatype

      Good question. This is a common problem and it is related to the way that Maven stores Central in the SuperPOM. Because you can’t remove a repository definition from the SuperPOM you have to define a mirror in your Maven Settings. So, doing this in the Maven POM alone won’t do the trick.
      Here’s a page in the Nexus book that has a Settings XML snippet you can copy and put in ~/.m2/settings.xml : http://www.sonatype.com/books/nexus-book/reference/maven-sect-single-group.html
      This is a Maven quirk that you’ll have to do regardless of which repository manager you use. I really wish someone would make this easier, but it’s difficult to change some of these file formats because it affects millions of developers.