Refactoring a Multimodule Project: Nexus Book Build


May 31, 2009 By Tim O'Brien

First, I’m happy to announce that the Nexus Book is now as open source as our Maven book. We made a decision about a month ago to free this content and make it available for anyone to view the source or modify the book as they see fit. All of our books are covered by a Creative Commons license, and the source is available from GitHub.

The topic of this blog post is an attempt to make more visible some of the decisions that go into the refactoring a simple project into a multimodule project. While the specifics of this project relate to docbook compilation and site publishing, the basic principles of refactoring a multi-module Maven project apply to almost every project that you will encounter in Maven.

book-nexus-before

Background

The Nexus book is a multimodule project which consists of a parent project and two submodules: content and examples. The content project depends on the examples project, the examples project is configured to spit out a ZIP assembly that contains all of the examples, and the content project is configured to generate an HTML version of the book, a PDF version of the book, and the site that you see here.

The lifecycle the content project is very busy and the project is responsible for generating multiple artifacts. For example, if you take a look at the pom.xml in the content project, you’ll see that:

  1. The generate-pdf goal from the Docbkx plugin is bound to compile,
  2. The generate-html goal from the Docbokx plugin is bound to compile,
  3. A simple templating goal is bound to the Site phase of the Site lifecycle.

So, this content project produces three outputs: an HTML book, a PDF book, and a fully rendered site that is uploaded to the Sonatype web server. So there’s a lot going on within this content project. If I notice a typo in a chapter, and I want to test the entire process, I have to sit through a lengthy rendering process as the build attempts to walk through the PDF and HTML build for the book. Instead of being able to focus on a particular part of an overall publishing workflow, I’m forced to run the entire process every time I make a small change. The build isn’t modular.

A Set of Audacious Goals

The reason why I’m taking a look at this particular problem is because I’m really ready to start moving this book toward a more automated build that can automate some of the tasks that are involved in writing this book. I’d like to add the following features to the build for this book:

  • Rendering an Eclipse book.
  • Validating Book IDs and Section Structure in the DocBook XML.
  • Injecting Examples from the examples project directly into code examples in the book.
  • Injecting the Output of Example Projects directly into screen listings in the book.
  • Automatically checking for example listing overflows (lines greater than a certain number of characters)
  • Adding a watermark to a specific build of a book
  • Adding a well-designed cover to the book (prepending and appending PDFs onto the generated PDF)
  • Running the PDF through automatic spellchecking as a part of the build process
  • Creating Plugin documentation tables using information already embedded into a Mojo.

Right, so this list could go on indefinitely, the point here is that I’m trying to find a way to add more automation to the book. Validation tests to make sure that we never produce a PDF that has a code example that trails off of the edge of the page or that all of our sections have the appropriate identifiers. While I know that it is possible for a lot of this to happen in XML-related technologies, I’m much more interested in using this as a chance to demonstrate the power of Maven as a foundation for complex, custom builds. I want this book and other, related books to be a test platform for not just writing a book, but writing a book that uses Maven and the structure of the repository to facilitate the development of complex, example-driven content.

An Overburdened Lifecycle

Given the audacious list of improvements listed in the previous section, what is the best way to go about adding something like pre-render DocBook validation tasks or pre-compile example injection into the current process given the current project layout? I could rearrange all of the work that is happening in the current content project and just squeeze more goals into the current content project’s lifecycle. This would have the unwanted side-effect of making the content project take even longer to compile and render a book. It would also mean that the content project would be responsible for generating even more artifacts. Clearly, any solution that is going to form the foundation for a larger, more extensible approach to building this content is going to require more than one lifecycle as the content project’s lifecycle is clearly too crowded for any additional goals.

While a cardinal rule of Maven is that one project produces a single artifact, this is a rule that is often stretched a bit. For example, you can have a single project produce, install, and deploy multiple attached artifacts generated using the assembly plugin. I also consider a site deploy to be a separate artifact, especially when, as in the book project, the site is the primary artifact to be generated. In other words, the site that is generated from this build isn’t just a supporting, ancillary site that describes some code, it is the primary artifact of the book. Whatever solution emerges from this refactoring should try to isolate the project that creates the site into a separate project.

If you have one lifecycle that is becoming very “busy” and if your project is starting to be repsonsible for creating more than one artifact or output, the solution is to start refactoring your modules so that one module is responsible for producing one artifact and each module uses the Maven repository as a medium to exchange dependencies.

The Refactored Solution

book-nexus-after

The refactored solution turns two submodules into six interdependent submodules.

  • nxbook-examples – Responsible for generating the examples ZIP
  • nxbook-content – Contains the DocBook XML
  • nxbook-html, nxbook-pdf, nxbook-eclipse – These modules contain the stylesheets and format-specific media that are required to render different book formats.
  • nxbook-site – Finally a site project depends upon other modules that create concrete artifacts like PDFs and ZIPs and adds a simple site.

We’ve created a number of new modules, but really what we’ve done is created six new lifecycles to use for customizing this build. Things like example injection and specialized XML validation can happen in the nxbook-content module in a lifecycle that captures the creation, testing, and packaging of nothing but content. The tasks that prepend a nice cover on the PDF book and subsequently apply a watermark to the entire PDF document can be hooked into the lifecycle that related to just the nxbook-pdf project. If you need to debug a problem that has something to do with the PDF rendering, all you need to do is hack away at the nxbook-pdf project.

Each project declares dependencies on the output generated by other nxbook projects. nxbook-examples installs a ZIP file in the repository which is subsequently used by nxbook-content to populate code samples. nxbook-pdf isn’t concerned with example injection of XML validation, it assumes that the JAR artifact it gets from the repository contains valid DocBook XML. The end result of this refactor is that we have room to expand the process to encompass the goals set forth earlier. We also have a cleaner, more easily understood build that consists of smaller components with a limited focused and a digestable POM.