How Do Application-Level Package Managers Work?

This is an excerpt from Out of the Wild: A Beginner's Guide to Package and Dependency Management, a Sonatype Guide. This is the second of three installments. Read the first one here.

We’ve established that managing dependencies is a complex task. But as Sam Boyer explains in his Medium article, “It’s not the algorithmic side that makes [application-level package managers] hard.”

“Their final outputs are phase zero of a compiler or interpreter, and while the specifics of that vary from language to language, each still presents a well-defined target. As with many problems in computing, the challenge is figuring out how to present those machine requirements in a way that fits well with human mental models.”

Take the Apache Maven application-level package manager as an example. It’s “primary goal is to allow a developer to comprehend the complete state of a development effort in the shortest period of time” by focusing on “making the build process easy” and “providing quality project information.” In fact, the term Maven itself comes from the Yiddish word meaning “accumulator of knowledge,” which is based on their Project Object Model, or POM file. (More on this later.)

That’s just one application-level package manager’s take on their high-level role in modern software development, but let’s continue down this path for a minute and talk about what application-level package managers (in general) have in common.

We’re simplifying a bit, but below are the key components used by most application dependency managers. It’s the interplay—and forward “movement”—of the elements listed below that makes for an effective dependency management system.

Project code

This one’s easy. First you have your source code that’s being actively developed; that is, the project you want the application-level package manager to manage dependencies for. This is usually stored in a Source Code Manager (SCM), such as GitHub.

Manifest file

A manifest is a file, specific to your particular application-level package manager, that you create to list the direct dependencies necessary for your project. It nails down your intent, such as using version 1.4 and above for Package X.

Lock file

A lock file, then, is machine-generated from the manifest, and it contains the actual dependencies and versions that the application-level package manager resolved from the manifest file as the project was being built. It basically contains all of the information necessary to reproduce the project’s dependencies.

Dependency code

Next, the dependency code is then generated, containing all of the source code and/or binary files as listed in the lock file, and “arranged on disk such that the compiler/interpreter will find and use it as intended, but isolated so that nothing else would have a reason to mutate it.” (Boyer)

Components of an Application-level Dependency Manager

Image Credit: Boyer, 2016: So You Want to Write a Package Manager.

Devopedia provides a good explanation of how the process works using these application-level package manager components:

“Dependency managers start by reading the manifest file, in which direct dependencies are noted. They then read the metadata of these dependencies from their repositories to figure out the next level of dependencies. In other cases, they may download the dependencies right away and then process their dependencies. Either way, all dependencies must be downloaded and installed.”

Application-level Package Manager Examples

Maven Example

As we briefly mentioned earlier, the Maven application-level package manager is based on a Project Object Model (POM). The pom.xml is Maven’s take on a manifest file, including all of the necessary information to build a Java application.

According to Maven’s docs, “the cornerstone of the POM is its dependency list.” When your project is compiled, Maven downloads and links your OSS dependencies, including “the dependencies of those dependencies (transitive dependencies), allowing your list to focus solely on the dependencies your project requires.”

Here is an example snippet from the Dependencies subsection of the docs:

 <dependencies>
   <dependency>
     <groupId>junit</groupId>
     <artifactId>junit</artifactId>
     <version>4.12</version>
     <type>jar</type>
     <scope>test</scope>
     <optional>true</optional>
   </dependency>
   ...
 </dependencies>

For more information on the groupID, artifactId, version, type, and other parameters that make up the Dependencies section of the pom.xml, see Maven’s docs.

You may have noticed that we haven’t mentioned a lock file for Maven yet. And that’s for a reason. In the Maven package manager, according to this StackOverflow thread, “There is no need to have a feature such as ‘lock file’, or anything like this, if your pom.xml strictly defines the versions of your dependencies.”

So, if you declare a specific version in your pom.xml, Maven will only resolve that version; therefore, your pom.xml becomes both a manifest and a lock file.

There are differing schools of thought around how to specify your dependency versions in any application-level package manager’s manifest, but if you lean toward the school of thought that favors version specificity (i.e., avoiding version ranges) as part of enabling reproducible builds, the lock file becomes unnecessary. Reproducible, or deterministic, builds are increasingly seen as a best practice within software development.

npm Example

In the case of the node package manager (npm), the package.json file serves as the project manifest. For Javascript developers using npm, use of version ranges when specifying their dependencies is pretty common, likely because it’s mentioned explicitly in the docs (but also because it’s less maintenance and enables faster updating of dependencies), and so lock files in the form of a package-lock.json or npm-shrinkrap.json are used to document the exact versions that were ultimately used in the build process. (Note that even with the “manifestation of the manifest” documenting the package versions used in the lock file, there are certain risks with specifying “latest” or version ranges for your dependencies, and we’ll discuss that a bit more later on.)

Here is a package.json snippet from the npm docs showing an example:

{
  "name": "my_package",
  "version": "1.0.0",
  "dependencies": {
    "my_dep": "^1.0.0",
    "another_dep": "~2.2.0"
  }
}

.NET Example

In the case of the NuGet application-level package manager used by .NET developers, a .nuspec file is used as the manifest.

Here is an example .nuspec snippet with dependencies specified:

  <dependencies>
      <dependency id="another-package" version="3.0.0" />
      <dependency id="yet-another-package" version="1.0.0" />
  </dependencies>

In addition, lock file functionality for NuGet was somewhat recently introduced, for NuGet.exe versions 4.9 and above.

And Then There’s This Other Type of Package Manager…

So far we’ve learned what application-level package managers are, as well as a very simplified, high-level view of how they work to manage the OSS dependencies we use in our software. We’ve also noted that they work closely with their programming language-specific repositories/registries such as npmjs.org or pypi.org, downloading the applicable OSS libraries as needed and resolving dependency conflicts. And we’ve looked at examples of the manifests and lock files used for three different application-level package managers.

So…now what? Once the components that make up your application are downloaded, and your machine understands how to arrange them using your application-level package manager, where do the components and “built” artifacts go?

Olivia Glenn-Han discusses this missing link in her article, The Universal Package Manager - The Most Critical Link in Your DevOps Toolchain:

“This shift from a monolithic application code base, to applications built on 100s of smaller parts, has directly led to a dramatic decrease in release times, as well as the advent of philosophies like Continuous Delivery, and DevOps. One of the biggest things that is still neglected, is how to properly store, and access these pieces.”

Enter the universal package manager (a.k.a., binary Repository Manager):

“Also known as binary repository manager, it is a software tool designed to optimize the download and storage of binary files, artifacts and packages used and produced in the software development process. These package managers aim to standardize the way enterprises treat all package types. They give users the ability to apply security and compliance metrics across all artifact types. Universal package managers have been referred to as being at the center of a DevOps toolchain.” (Wikipedia)

Find more in the Sonatype Community, a place where you can ask questions to other Nexus users and the Sonatype team. Choose from an assortment of learning paths, developed by a team of experts, that helps make using Nexus even easier. I definitely recommend it.