How to Back Up Nexus Configuration and Repository Artifacts


January 25, 2010 By Juven Xu

If you recently installed Nexus and have started using it to support internal development and collaboration, you will likely want to know how to configure backups to capture your configuration files and repository data.   Any system as central to your development effort as a repository manager needs to be backed up on a daily basis.   Hard drives and power supplies fail, and critical repository artifacts in a hosted repository may be inadvertently deleted.

In this post, I go through the recommended procedures for backing up a Nexus installation.   I discuss which files and directories need to be backed up, and I make some specific recommendations about backup configuration.   Luckily, Nexus was designed to use the filesystem to store both configuration and repository data.   This means that backing up your Nexus installation is as easy as configuring an automated backup tool such as amanda or a simple backup script that uses rsync.   There is no database to export or server to suspend for the duration of the backup.  Backing up or restoring a Nexus installation is as easy as copying a set of files.

What To Back Up

When you installed Nexus, two directories were created.   An application directory (nexus-professional-1.4.1/) and a Sonatype Work directory (sonatype-work/).      The application directory contains all of the executable files and libraries that comprise the Nexus application, and the Sonatype Work directory contains all of the installation-specific configuration and repository data for your own installation of Nexus.   If you have modified the configuration of Nexus, that configuration is stored in sonatype-work/.  If you’ve uploaded artifacts to hosted repositories, customized repository metadata, or proxied artifacts from remote repositories, all of that data is going to be stored in sonatype-work/.   In this blog post, we are going to focus on this sonatype-work/ directory as it contains all of the data we need to back up.

Note: If you have deployed Nexus as a WAR in your own servlet containter. The sonatype-work/ folder should be under the user home directory by default.   If you have deployed Nexus using the standard distribution (embedded Jetty), the sonatype-work/ directory will be a sibling to the nexus application directory.

If you have the disk space to spare, you can simply back up the whole sonatype-work/ directory.   If you are looking for more detail about what is and is not critical to back up, this section will detail the contents of the sonatype-work/ directory from a backup perspective.  Some of the files in this sonatype-work/ directory are critical while other files can be easily regenerated if they are lost. Here is a full list of subfolders, from the most important to least:

  1. nexus/storage (hosted repositories) — Each repository has a corresponding folder in this directory. For example, hosted repository with id ‘release’ has a folder ‘nexus/storage/release’.  If you deploy internal artifacts to hosted repositories, you are going to want to back up the storage directories for all of your hosted repositories.  Artifacts stored in hosted repositories are not likely hosted in any public repositories, we must back up them.
  2. nexus/conf —  All the nexus configuration files are stored here, this includes repository configuration, security configuration, and log configuration. Although you can recreate them in theory, you’d hate to lose them — it takes time to recreate them, so, back up this folder.
  3. nexus/indexer-pro — If you are using the Custom Metadata plugin in Nexus Professional, you will need to add this directory to your backup scripts.   Nexus uses this directory to store custom repository metadata.
  4. nexus/logs —  All the nexus logs are stored here.  You might want to back up these logs to make sure you know the history. If you are looking to conserve space, this directory can be omitted from the list of files to back up.
  5. nexus/timeline — Most of the important events like authentication failure, scheduled task starting, and recently deployed artifacts, are stored in the timeline and displayed via RSS feeds. These events are stored in this folder.  Back up this directory if it is critical to maintain an audit trail of all operations on Nexus repositories and configuration.
  6. nexus/storage (proxy repositories) —  Once you’ve been using Nexus for a long time, a huge amount of artifacts are cached from public repositories like central. Since these artifacts can be retrieved from public repositories at any time, there’s no need to back them up.    If you have the disk space, you can still consider including cached artifacts in a backup.   Backing up these files can save time and bandwidth should you ever need to restore a Nexus installation from a backup.
  7. nexus/trash — Back up trash? There is usually no need to back up the trash. When artifacts are deleted in Nexus, they are actually moved to the trash directory.   While you can include this file in your backup, it only makes sense if you have a history of accidentally deleting artifacts and then emptying the trash directory.
  8. nexus/indexer — Nexus stores a repository index for every repository in this folder.   Since Nexus can rebuild an index using the reindex task, there is no compelling need to back up this folder.
  9. nexus/proxy — Artifact attributes for remote repositories can be regenerated.   There is no need to back up this directory.

How To Back Up

Now that you know what to back up.   I will describe the process I use to back up my own instance of Nexus.   I don’t use a fancy automated back up tool like amanda that can make incremental backups on remote storage.   I simply use rsync and copy the resulting backup to a removable disk.  My Nexus installation is running on Ubuntu Linux, and the path of sonatype-work directory is /home/juven/bin/sonatype-work/.  I have a removable disk which is mounted at /media/disk/, and I am using rsync to make backups of sonatype-work/ each night:

$ rsync -a --delete -v ~/bin/sonatype-work /media/disk/

The options I pass to rsync are:

  • The -a option which tells rsync to run in archive mode.   It runs recursively, keeps symlinks, preserves permissions, preserves times, preserves groups, preserves owners, preserves devices, and preserves special files.
  • The –delete option which tells rsync to delete files in the target directory if they were deleted in the source directory.
  • The -v option which tells rsync to show verbose log.

This command will back up the whole sonatype-work/ folder.  What if I only want to back up some important the sub-folders?   To do this, I create an rsync includes file that lists the directories and file patterns I want rsync to include in my backup:

+ /sonatype-work/nexus/storage/
+ /sonatype-work/nexus/storage/releases/
+ /sonatype-work/nexus/storage/thirdparty/
+ /sonatype-work/nexus/conf/
+ /sonatype-work/nexus/logs/
+ /sonatype-work/nexus/timeline/
- /sonatype-work/nexus/storage/*
- /sonatype-work/nexus/*

This tells rsync to only include what we want, and exclude anything else, now run rsync again with the includes list:

$ rsync -a --delete -v --include-from includes.list \
         ~/bin/sonatype-work /media/disk/

Automate The Back Up

If your backup disk is always connected to your machine and you don’ t want to repeat the backup command manually, schedule a task to run the backup periodically.  On Linux, cron is a perfect tool for automating the backup task. To configure the backup job, edit /etc/crontab, add a line like this (you would change juven to an appropriate username on your own system):

0 0 * * 0 juven rsync -a --delete \
   --include-from ~/bin/includes.list ~/bin/sonatype-work /media/disk/

This entry tells cron to run the rsync command as user juven once a week.   That’s all there is to it.

  • JU

    This is exactly how I back up my Nexus, but the theoretical question of a race condition during a live backup still bugs me. Nexus stores a great deal of metadata about the repository state. You can rebuild much of this with the reindex and rebuild metadata tasks, but how would you guarantee, for example, that your backup feeds recorded a deployment that happened during the backup?