I’ve had a number of non-programmers ask me “What is Nexus?” What is it? What does it do? You’d think I’d have a quick answer for the question, and I have a few that I always resort to that revolve around efficient collaboration and ease of deployment. But the challenge in this particular conversation was how to convey “What Nexus is” to someone who might not have direct programming experience. It is easy for a developer to talk to another developer, as we share the same set of experiences. I can say “compiler” and be reasonably certain that you not only know what a compiler does, but that you use one every single day. So, when I say something like:
“Nexus is a repository manager. It allows you to proxy, collect, and manage your dependencies so that you are not constantly juggling a collection of JARs. It makes it easy to distribute your software. Internally, you configure your build to publish artifacts to Nexus and they then become available to other developers. You get the benefits of having your own ‘central’, and there is no easier way to collaborate.”
Someone who doesn’t program every day nods, feigns approval; “That sounds very interesting”, but I am still skeptical that I got the message across. I’ll usually resort to an assembly line analogy; “Think of Nexus as an assembly line, it provides a context for collaboration.” But, still, there is a lack of shared context. If you are a programmer, my previous explanation was likely clear. If you are not a programmer, you’ll have questions. What is ‘central’? What is a ‘JAR’? What am I proxying? Instead of attempting to explain Nexus in a paragraph, let’s try to set up some background.
What is Nexus?
Nexus is a repository manager, it stores “artifacts”, but before jumping into abstractions, let’s start with a description of software development. We’ll begin with a simple description of what software development involves, and, for the purposes of this article, we’re going to discuss Enterprise Java Development. Before I can tell you what Nexus does, we have to answer the following question…
How is Software Developed?
Use a web site like the New York Times or even something as deceptively simple as Google, and it might be easy to think that the effort takes a handful of developers, some graphics designers, and a few weeks of effort. Think again. For a system of any significance, we’re usually talking about tens to hundreds of developers (sometimes thousands of developers) split into different groups each with a distinct focus. For example, think about a large, international bank. Such an organization will likely employ thousands of developers spread throughout the world. There might be a hundred developers in San Francisco focused on web development, with tens of developers in New York focused on market feed and data integration. Maybe there is a group that focuses on back office operations in Toronto, and a group dedicated to international clients in Dubai, and another development group in Bangalore.
The point here is that modern software development is often distributed, and the largest systems involve hundreds of developers all trying to collaborate on a single enterprise system. “Big software” can involve thousands of developers and today’s CTOs and CIOs realize that the way to remain agile, the way to innovate, is to adopt best practices from the open source community. Open source projects like the Linux Kernel or open source communities like the Apache Software Foundation are proof that thousands of developers can collaborate on complex systems if they are provided with efficient infrastructure and a shared set of assumptions about how software is developed. While it is often difficult to import the culture of open source into a corporation, it is fairly easy to standardize on the same tools, the same “efficient” infrastructure which allows a large, ad hoc group of developers to collaborate.
Efficient Development Infrastructure
Source control systems, issue trackers, mailing lists, continuous integration servers, wikis, integrated development environment, and repository managers; when you start a new open source project one of the first things you decide upon is what you are going to use for each of these components. Let’s go through each of these components and discuss the various choices that are available:
- Source Control Systems: This is the service that is going to track your code, the code that is eventually transformed into a working system. So, if you are developing web sites, the code is likely a primary artifact right next to content and design. Source control systems are an established piece of infrastructure and programmers have been using them for decades.
- Mailing Lists: While this might seem too simple to be considered a part of development infrastructure, it is often the most essential and most basic piece of infrastructure. Efficient development teams tend to share just about everything technical on a shared mailing list. Each focused development team has a mailing list, and the discussions are archived for future reference. As the team’s composition evolves over time, this shared conversation can be an important record for bug fixes and institutional knowledge about an application. Mailing lists, like source control, have been around for decades.
- Issue Trackers: Your team will need a place to store tasks and bug reports, and they will also need to have a way to plan what goes into the next software release. A good issue tracker provides a programmer with a way to customize and filter the list of issues by project or by person. A great issue tracker can double as both a collaboration tool and as a simple productivity dashboard for programmers. Issue trackers have been used by open source projects for more than a decade, but modern issue trackers (like JIRA) are still developing features which redefine the category.
- Continuous Integration Servers: Think of an automated “robot”, sitting in a room next to your developers constantly waiting for code to change. When code changes, this automated “robot” brings those changes on to his system and performs a software build. If the build fails or if tests fail, all of the developers are immediately notified of the failure, and everyone drops everything to make sure that problems are quickly addressed. The principle behind continuous integration is that the sooner a bug is found, the easier it is to fix. If your cubemate checks in some bad code, there is less of a chance of it going unnoticed if there is a system constantly running and testing the code as it is changing. Before the age of continuous integration servers, programmers and developers might only perform whole system builds and integrations once every few weeks or months during a software release. While this might seem like an obvious part of software development, continuous integration servers have only become standard in the last five years.
- Wikis: If you used Wikipedia, you know what a Wiki is. It is a shared website that is easy to edit and open to any participant. Most internal development teams have a Wiki for collaboration that doesn’t make sense to conduct on a mailing list. The net effect of having a Wiki for a development team is that the team tends to send less Word or Excel attachments around on the mailing list. If there is some specification that needs to be developed, it is normal for people to do so on a Wiki. The Apache Software Foundation only started allowing for an open Wiki six years ago, and it is has only recently become something standard in most internal development environments.
- Integrated Development Environments (IDEs): This is the primary tool that developers use every day. If you see a programmer “coding”, she is very likely staring at a GUI tool, like Eclipse or IntelliJ, which contains tools that make it very easy to write code.
This brings us to the final component of efficient development infrastructure – repository managers.
Repository Managers: You can think of a repository like you would think of a library. It is a server that stores and retrieves files, which we refer to as artifacts. When you write a piece of software, you are often depending on external libraries. If you are developing a system to send a rocket into space, you might depend on a library that provides functions to calculate the effects of gravity. If you are building a web site, you’ll likely use a framework designed to serve web sites.
In Java, these libraries are stored in binary files called JARs, and if you are working on a complex system, you might require hundreds of external libraries in an application. The primary use of a repository manager is to proxy and cache artifacts from “external” repositories. Your organization uses open source libraries, and when your build needs them it will automatically query a local repository manager. If that local repository manager does not have that particular artifact, it will retrieve it from an external repository server and cache it for later use. (Think of how an academic, Interlibrary Loan system works. Ask a librarian for a rare book. If he doesn’t have it, he will likely call up another library and have the book sent over.)
“Central” refers to the “Central Maven Repository”, you can think of “central” as the global repository manager that stores all open source components. “Central” has millions of users throughout the world, and it is fed by thousands for open source projects. It is the modern-day “Library of Alexandria” for open source components, and it greatly reduces the work required to distribute software to millions of developers. If you have something to share with the world, put it on “central”, distribute the coordinates, and in minutes everyone should have a copy.
Before the advent of “central”, users would have to manually update dependencies, and open source projects would have to try to get the word out so people knew about a software release. In 2010, releasing a new open source component to the Central Maven repository is a non-event. All a developer needs to do is publish to “central”; the rest is automatic. If a developer is using a repository manager, they can even configure the tool to notify them of all new releases. The “central” Maven repository manager is the central fabric of collaboration, and over the past eight years, it has changed the way that software is distributed and developed.
While “central” brings efficiency to an entire world of software developers, running an internal repository manager brings efficient collaboration between your developers and teams. If one team develops a library used by another team, they can use an internal repository manager to distribute software releases internally. If your development teams are delivering applications to an operations group for deployment, they can use a repository manager as a way to share final products.
While the open source world has standardized on repository managers in the past three years, most organizations are still at the beginning of the adoption curve. The organizations that have adopted a repository manager understand the benefits, and understand that development without one would be a process fraught with inefficiency. I’ll predict that, within three years time, most organizations will be running a local repository manager called Nexus. It is as essential a technology as source control.
What is Nexus? for the Non-programmer
So, “what is Nexus?” for the Non-programmer. If it still doesn’t make any sense after reading this post, think of it as a library. You can ask it for “artifacts”, it will store and retrieve them and assign a standard coordinate system to the artifacts it stores. If you are developing software, having this facility available allows you to catalog and store your own artifacts using the same “numbering” system that the library uses. When a group develops a new system or a library, they submit it to the repository manager. Other groups then have a standard way to access these libraries. This standard for cataloging and addressing files brings efficiency.
Think about how difficult it would be if books didn’t have ISBN numbers, or if libraries didn’t have a filing system like the Dewey Decimal system. You wouldn’t be able to search for books, you wouldn’t be able to quickly locate a book on Amazon. The repository manager is that “library”.