Benefits of a Repository Manager: Part II – Caching and Collaborating


August 5, 2010 By Tim O'Brien

In the first part of this series, I gave you a glimpse at the big picture of repository management, and I listed some common anti-patterns present in systems that haven’t yet installed a repository manager. In this post, I’m going to focused on the benefits a repository manager has on the development cycle. How does using a repository manager change the way your developers will approach development? What problems does it solve? And what possibilities does it introduce?


Caching artifacts locally

The first, and most obvious, benefit is that a repository manager will proxy and cache artifacts from a remote repository. If you use Maven or another tool that can download artifacts from Maven Central, you have probably had the experience of checking out a large build from source control and running it, only to have to wait for minutes while the build downloads dependencies from Maven Central.  An even bigger frustration than that, if your connection to the internet is tenuous, or if Central happens to be unavailable for whatever reason, you cannot build your code.

With a repository manager, downloading your dependencies takes much less time and your builds don’t rely on internet access. Unless this is the first time building a particular project, all of the artifacts your build uses are going to be cached at the repository manager. Instead of waiting for artifacts to be downloaded over the public internet, your build will download artifacts from a local server. Since you avoid the public internet, this process is sped up significantly. The build that once took 15 minutes to download dependencies will now take less than a minute to download everything it needs.

Get those JARs out of Subversion

The Oracle JDBC driver, a proprietary JAR from a vendor, is the kind of one-off, third party library that is not going to made available on a public repository. Without a repository, the most obvious solution is to just check these JARs into source control and store them right next to your code. When you run your build, branch your project, and release your code you are just passing around these binaries as a part of your project. While this approach does work, you are overloading the source control system to perform a task it wasn’t designed for.

While Subversion, and many other modern SCM tools, will gladly version binaries, you are storing a static, unchanging artifact in source control. Every time you checkout out source control, you are checking out binaries, and if you are unlucky enough to work somewhere that stores all libraries in source control, you will often be asked to check out a large repository of JARs. Your SCM is going to carefully keep track of changes and version files that will never change.

This is both inefficient and unnecessary. If you use a repository manager, you can upload third party libraries to a third party repository that stores artifacts on the repository manager. Then you can mix these third party libraries into a public repository group that will combine the contents of public repositories with your own repositories. In other words, you can declare a dependency on a custom, manually uploaded third party JAR as easily as you would declare a dependency on a library stored in Maven Central. Don’t store binaries in source control, use a repository manager instead.

Collaboration

Collaboration is best illustrated with a hypothetical organization. Imagine you work at an organization with three large development groups of about 30 developers. There is an eCommerce group, which is responsible for writing systems that interact with banks and service providers like PayPal. There is a CRM group, which has the unenviable job of merging two disparate CRM solutions from Peoplesoft and Oracle and wrapping those systems in an easy to use Java API. Then there is a third group, the web applications group, which has the primary responsibility for wrapping these two back office systems in a slick web interface. These relationships are captured in the following diagram:

Imagine that the eCommerce group and the CRM group have to assemble the entire external API into a single project: ecommerce-api and crm-api. These APIs contain all of the logic required to connect to a set of internal services hosted by each group. As the eCommerce group and CRM groups innovate and offer new services, they will cut releases of these new APIs and publish those releases to the repository manager.

When the web applications group is ready to start using a new version of the eCommerce or CRM API they can simply define a dependency on a version of this artifact. The build for the web applications group will then download the appropriate version of the eCommerce of CRM API from the corporate repository manager.

In this way, the repository manager is a central collaboration point for difference groups within the same company. Without the use a repository manager, you will often see disparate groups forced to check each others code out of SCM and build it from scratch just to generate client libraries. In the organization that has embraced repository management, these libraries are published as binaries for clients (in this case other workgroups) to consume from the repository manager.

When you collaborate using the SCM, it is very difficult to scale. Every workgroup is forced to use the same set of tools for build management and you will very often see the entire IT operation having to synchronize releases. When you share build artifacts using the repository manager, you decouple workgroups from one another and allow your internal teams to collaborate using a more adaptive, open-source model.

If the eCommerce group needs to innovate, they can do so without affecting the source code of the web applications group. They can publish new releases to the repository manager, and the web applications group can decide when and how they are going to consume these releases by changing the version of a dependency in a build.

When you collaborate using the SCM, all of your workgroups are joined at the hip by a common codebase. When you use a repository manager you allow different workgroups to innovate and create at their own pace.

In next week’s posts: Continuous Deployment and Integration