News and Notes from the Makers of Nexus | Sonatype Blog

Nexus Indexer 2.0: Incremental Downloading

Written by Damian Bradicich | May 13, 2009

The nexus indexer has become fairly popular, and is the de-facto standard when it comes to indexing maven repositories (including the big boy, central). As repositories grow and grow, the index of artifacts grows right along with them. What was initially a small few hundred kilobyte file, will grow into 20-30 megabytes or more over time. Seeing as the index is the gateway into the contents of a repository (not for maven mind you, but for users), this is the most downloaded file, and a 20mb file being downloaded by thousands of people every day, the bandwidth costs can get pretty high. To combat this, we have introduced incremental index handling into the nexus indexer.
There are 2 parts to this, building the incremental indexes for consumers to download, and retrieving the incremental indexes from a provider.

Building Incremental Indexes

When the daily task runs on central to create indexes, the most recent content (in its entirety) is stored in the nexus-maven-repository-index.gz file. This file is always available as a fallback, in case a consumer doesn't properly handle incremental indexes, or the consumer has fallen so far behind, that the provider no longer has all of the incremental portions that the consumer needs. Along side this, an incremental index is generated that contains all changes (adds/updates/deletes) since the last time the index was generated. This incremental file is very small, in comparison to the full index, in most cases being ~10kb on a daily basis. These incremental files are listed in a nexus-maven-repository-index.properties file, along with a chain id. This chain id is used to 'reset' the incremental chain, should a full index download be required for some reason.

Retrieving Incremental Indexes

If the consumer application is integrated with the nexus indexer (at least version 2.0), then there is nothing to worry about, the nexus-indexer will manage downloading the incremental pieces it is currently missing from the base, and should anything not line up (requires incremental pieces that the provider no longer carries, or the chain id is different), the indexer will then download the full index file and will then start checking for incremental changes next time it updates.

This is all handled in the nexus-maven-repository-index.properties file.

  • nexus.index.chain-id: this is the chain-id of the current incremental items. If at any time this value changes from what the consumer has in its local properties file, the consumer should trigger a full .gz index download (and of course the properties file, to keep up to date)
  • nexus.index.last-incremental: This is the last incremental item available, simply an integer that gets inserted into the download file name. If consumer has the same value in its local properties file, no need to download anything.
  • nexus.index.incremental-X: These are the properties that list each incremental item available. The first item (where X = 0) is the oldest incremental piece that the provider still maintains. If the consumer's local properties file contains a last-incremental value less than this, then need to download full .gz index (and properties file) and continue on. Othewise, simply need to grab every nexus-maven-repository-index.X.gz file (where x is greater than your local last-incremental and less than or equal to the remote last-incremental) available from the provider.

Support for legacy index applications

Of course we don't want to leave the legacy guys out in the cold, so the old timestamp based properties are available as well:

  • nexus.index.time: Timestamp that the legacy .zip index was last created. If this timestamp differs from your local properties file, will want to download the full .zip index
  • nexus.index.timestamp: Timestamp that the .gz index was last created. If this timestamp differs from your local properties file, will want to download the full .gz index

So to wrap everything up, plain and simply, if your application is integrated with the nexus-indexer, you should definitely upgrade to 2.0.0, to get this enormous bandwidth saving. This has already been achieved with the latest m2eclipse 0.9.8 release, and will be coming in nexus 1.4.