Benefits of a Repository Manager: Part I

August 4, 2010 By Tim O'Brien

Whenever I speak to someone doing Java development, I always ask if they are using a repository manager. Repository managers are still an emerging technology, but I’ve noticed a consistent trend: more and more developers view a repository manager as an essential part of development infrastructure. This certainly wasn’t the case just two years ago, and I think that the big motivator behind this trend is that the quality and stability of Maven Central has improved remarkably because of the efforts of people like Brian Fox and others who are focused on making the service more stable.

Another reason why we’ve seen more adoption is that most developers understand the benefits of using a tool like Maven for automatic dependency management. In 2005, it was common to see projects store binary JARs alongside source code in projects. In 2010, you rely on the repository and the metadata it contains. If you use a library like Guice, you’ll add a dependency on the artifact and let your build tool take care of the details. To do otherwise would be to commit yourself to a manual work updating JARs and testing dependencies each time a new version of an external library is released.

Despite the increasing prevalence of repository managers, I still stumble upon workgroups and organizations that haven’t heard of repository management. When you ask if they are using a repository manager, they might think you are referring to Subversion or source control. This series of posts is a high-level overview of the main benefits of repository management. If you are trying to convince someone to start using a repository manager, the next few blog posts are for you.

Repository Management: The Big Picture

Compare the diagram shown above with the diagram shown below. In the next few posts, I am going to emphasize the specific benefits of using a repository manager. Specifically, I’m going to talk about:

  • How a repository manager changes the development cycle
  • How continuous integration is used to continuously publish internal build artifacts
  • How a repository manager simplifies the process of building and deploying systems to production
  • How a repository manager can act as a gateway between vendors and external partners

When you don’t use a Repository Manager

Before I get started on the benefits of repository management, I want to talk about the realities you face when you don’t use a repository manager. Here are some common anti-patterns when you don’t use a repository manager:

  • All of your developers download artifacts directly from public repositories. A new developer starts on a Monday. That developer will spend an hour downloading a massive library of dependencies from Maven Central. Worse, if Maven Central happens to be down that day, they will be out of luck entirely.
  • Proprietary or Vendors libraries are passed around, from developer to developer. If you don’t use a repository manager, how do you distribute the Oracle JDBC driver? Maybe you place it in a shared file system and tell people to download it and install it in ~/.m2/repository. More likely, developers just pass this JAR around as an email attachment with some ad-hoc instructions.
  • JARs are checked into source control. If you don’t use a tool like Maven, which knows how to download artifacts from a remote repository, you might be following the very common pattern of checking binary dependencies and libraries into source control. I’ve seen many instances of companies creating ad-hoc JAR repositories and checking these repositories into source control, only to version and branch these static binary files with every release.
  • The source control repository is used to store everything from source code to binary builds. Because there is no repository designed to store binaries, developers start to use tools like Subversion to keep track of binaries. As time passes, the Subversion repository becomes an ad-hoc file system for files that have no business in an SCM.
  • The continuous integration server depends on public repositories. When you change your build or add a new dependency, your CI system downloads dependencies from the public repo. It depends on the availability of this public resource to run builds.
  • Production deployments have to run the entire build, from start to finish, to generate binaries for deployment. When a build is tested and then ultimately pushed to production, the build and deployment scripts checkout source code, run the build, and deploy the resulting binaries to production systems.
  • Sharing source code with external partners means granting them access to your SCM. Since there is no established mechanism for publishing source or binary artifacts, the only way to share code with partners is to either send an archive of your source, or provide them with direct access to your SCM.

The general theme in all of these anti-patterns is that either your systems depend on public resources, or they all depend on the SCM system as a central collaboration point. In the next few posts, I’m going to detail how using a repository manager provides a solution for each of these issues. I’ll go into why each of these anti-patterns is a bad idea, and how you can use Maven, Nexus, and Hudson together to solve these problems and create a more efficient software development effort.

Stay tuned for the next post: Caching and Collaborating.

  • Jon

    We’ve certainly found a repo manager essential for caching third party dependencies. But after some bad experiences, we’ve stopped using it for internally generated artifacts in our multimodule Maven project.

    The problems centred around the need for an individual developer always to be building/running/testing a coherent, self-consistent codebase comprising all modules in the project. It’s perfectly reasonable for a developer to make changes to more than one module at a time, before check-in, but having a shared repo to which Hudson built – and to which ‘mvn deploy’ could be made deliberately or accidentally by developers – made inconsistencies all too frequent.

    For example:

    - developer checks out HEAD
    - developer makes changes across two modules
    - someone else checks in change to dependent module
    - Hudson builds it and publishes snapshot to repo
    - first developer carries out local build in which the dependent module isn’t itself built
    - Maven checks the shared repo and determines that Hudson’s version of the dependent module’s artefact is more recent than that in the developer’s private local repo, so uses the Hudson version
    - bang! developer isn’t using the code they expect

    Am I missing something? I’d appreciate any insights into ways round this.


    • andy

      Hi Jon,

      du you thougt about to configure a own project in hudson for each submodul (in case you have a three level multi project that means a submodul for each 2 level modul) . each of the sub modul projects in hudson shoulöd make a mvn clean deploy and in case of success it shuold trigger an hudson fullbuild project which makes a mvn clean install on the root pom. (the hudson workspace specific m2 repo shuold be erasesd from your own artifacts before). the fullbuild makes only a clean install over all modules to ensure that all modules compiles together. sorry, my english is not excellent, hope you understand me

    • Brian Fox

      Maven only looks at remote repos once a day for updated snapshots, but you can change that behavior by changing the update policy in the settings.

  • Ken Fox

    You haven’t made a good case against using a distributed version control system such as Mercurial or Git. Take my team’s projects as an example: we generally have stable and dev repositories for each project on a CI server. Developers check out copies to their laptops for integration purposes, then clone those local repos as necessary to organize their work. Hard links are used in the clones so files that don’t change use no storage. Files that don’t change are never part of a change set and are never sent over the network. Compared to source code change tracking, managing these files is practically free.

    Projects may use other projects at run-time, but that has to be configured with dependency injection otherwise you can’t test efficiently. If someone truly needs to build a fully functional system on their laptop, it’s an advantage to have simple builds containing all the necessary components. Checking out 40K LOC to understand 3K LOC makes perfect sense if you are trying to understand the relationships between the projects. To run the unit tests though, you should only ever have to check out 3K LOC regardless of what package management solution you have.