Writing a Book with Maven: Part I


April 14, 2008 By Tim O'Brien

DISCLAIMER: In this post, I express my own, somewhat controversial, views about Doxia and APT. These are solely my own views and you should not assume that they represent an official statement from Sonatype.

This Maven Book is created using Maven. Everything you see is produced using Springer’s plugin: http://code.google.com/p/docbkx-tools/
– I don’t pass in all of the configuration variables via the
pom.xml. We have two stylesheets html_chunk.xsl and xslfo.xsl which
help us print out a nice looking PDF and web site from the book. In the next part of this series, next week, I’m going to start blogging about the Maven project we use to manage the book. By next week, I’m going to try to have a Maven archetype ready for people who want to produce a book with Maven. I might even put a chapter in the book about using Maven to create a book (recursion). Ultimately, I’d like to help start a few projects that will make it easier for people to write books using the same technologies we use to publish this book. We need to get more developers writing good content, there are too many technical books being written by people who haven’t had a day of real coding experience.

DocBook, WTF?

We use DocBook. The original idea was to use APT, but once I started working on the book, I insisted on DocBook to the surprise of many people involved with the effort. In this post, I explain why I think DocBook is the best choice for writing a book.

I’m going to spoil the party for APT lovers. APT is
impossible.
I’m convinced that APT is the reason why most Maven
documentation and many Maven sites are terrible things to try to read.
I’d encourage anyone trying to use the Maven Site Plugin to dump it
and start using the XSite
plugin
. Don’t be afraid of HTML and Markup, Maven sites would, in
general look natural and simpler if you didn’t have to suffer through
all the canned copy and left navigation menus. It is impossible use
APT to write a book with styled cross-references, a good index,
appropriate in-line styles. The ability to differentiate between a
listing of source code and numbered examples, variable lists, the
differences between a chapter and a part are a preface. All of these
things come with DocBook.

That being said DocBook tools are a terrible curse. The editor I
use is XMLMind. Not
only is XMLMind not free, it is about as usable as Emacs on a keyboard
that has a broken Meta key. But, you get used it, and you learn to be
productive. It takes a year, you initially swear it off, but then you
come back to it and admit defeat by purchasing it and learning how to
convince it to cooperate. In two years time, you’ll start to respect
XMLMind and you might even start to customize some of the key
bindings. In other words, it ain’t easy. But, this brings me to my
next point…

…writing a book is not easy

When a bunch of developers (I still consider myself a developer,
not a writer) get together and decide to write a book, there’s this
underlying tension. A developer’s job is tough enough, they don’t
want the writing process to start siphoning already scarce time off of
the development cycle. The initial reaction is to choose some
technology like APT because it is easier to write simple things with
simple markup. This works for a while, you’ll write a few chapters
and you might even start to develop innovative little plugins to
including source code, etc. But, as the content grows in size, and
you start getting ready for print production, you’ll start to think
about things like:

Cross-references
A large book without cross-references is about as useless as it
gets. If I’m in Chapter 5 Section 3 and I want to reference Chapter
1 Section 1.2, and I don’t have a way to exactly specify an element
in a document, what happens when I move a chapter around or when I
want to insert a section before Section 3. Sure, I can develop a
facility within some Doxia engine to allow me to reference a section
of a document, but then you’ll want to do things like customize the
text of the reference. Maybe half the time, I’ll want to say “See
Section 15.1 for more info”, but just as often I might want to say
“See Section 1.5 Aggregating Stuff for more info”. The point here
is that cross-references are increasingly important for both the
PDF, HTML, and print output, the only way to equal what comes out of
the box with DocBook is to add more hacks to APT and customize the
engine that reads it.
Inline Styles
This is probably the one thing that throws most developers-turned-writers into a tailspin. The idea that every command,
classname, code reference, variable reference has to have a
different inline style. This takes most people a few weeks to get
the hang of, but once you start doing this, you’ll start to realize
that it is essential to making readable technical content. Pickup
any O’Reilly book, and you’ll notice that it contains a heavy amont
of inline styling – Classnames are in a fixed font, they are
differentiated from commands on the command-line. We don’t just do
this because we like to be fancy, we do this because it is a subtle
hint to the reader that eases comprehension. It is also something
that requires different markup elements in the book’s source. There
are classname, methodname, variable, code elements in DocBook to
handle this. Not so in APT Because APT is solely focused on
presentation, you can’t embed semantic meaning within it. You can’t
say, “this is a classname”; instead, in APT you say, “make this
italic” or “make this bold”.
Print Production
The publisher I’ve worked
with
formats the book in DocBook before they send it off to the
presses. There’s a lengthy production process during which the book
is converted to DocBook (if it isn’t already in DocBook) and someone
is going to go through and make sure that the book has all the right
inline styles. Then someone is going to go through and markup all
the index terms (indexing is an arduous and mind-melting experience
BTW). I prefer to produce a product that doesn’t require too much
manual futzing with after I deliver it. I understand that the
production dudes need to tweak the content a bit, but, I prefer the
idea that my stuff doesn’t have to go through some sort of filter
before it gets to the real content. More on this in later parts of this series…
Formatting for Print/Web/PDF
Sure, I understand that I can get some APT stuff to spit out a PDF
and a web page. But, can I tell it what section level I want it to
descend to when computing the contents of a table of contents? Can
I put a watermark on the output and put a disclaimer in the header
of the preface to signify that the output is an alpha release?
(something I need to do) Can I generate endnotes? how about
footnotes? I could go on and on and on an on about things DocBook
can do that APT can’t. It all really boils down to tools and the
fact that, with DocBook, I’m capturing more than just syntax -
DocBook is semantic and there are a whole host of tools out there
that let me convert that output to good looking output. I’m sure someone is going to comment that all of this is possible with some sort of customized Doxia plugin (see previous, I think Doxia should be thrown overboard.)

You could hack up APT so much so that it closely approximates DocBook. You could muck around with the various Maven plugins involved in the process to make it easy to include code samples and snippets….. or, you could use the tools and technologies which already exist. I’m no big fan of reinvention, so for me, the solution was to use DocBook. Furthermore (ugh), hacking APT to the point where it supported a featureset similar to DocBook would’ve meant making APT more like DocBook. By definition, I don’t think you can write a book which requires this much semantic stuff in a wiki-like format without making the wiki-like format more trouble than it is worth.

…stop trying to make it easy…

Even when I wrote Jakarta Commons Cookbook in Word, it was far from easy. There was an ultra-nifty (but very complex and unstable) set of VB macros which were used to manage cross references and inline styles. There was a whole host of keyboard shortcuts, etc. For a 400-page book, I had to split the document into chapter DOC files and have every document open in Word in order for cross references to properly render. It wasn’t an uncommon experience for Word to just blow up and refuse to respond. That was about four years ago, in the intervening years there have been various efforts to simplify the process and move to different tool platforms.

Books have been written in OpenOffice. (And, yes, there are books that have been written in APT.) A few people have tried to write books using collaborative web applications. There has been this persistent idea that people could collaborate on a Wiki and produce a great book, etc….

For me, the most difficult part about writing a book isn’t the technology used to write it. From a wiki-like markup to etching every word on to a stone tablet, for me the most difficult part of writing is the process itself. I use a difficult tool to write with XMLMind, but I spend most of my time writing, and rewriting, and rewriting, and rewriting, and proofreading, and rewriting, and rewriting……

And, writing about technology for a tech-audience isn’t easy. I guess what I’m trying to say is stop using tool selection as an excuse to procrastinate and get down to the business of writing. Writing isn’t easy; in fact, it is just as difficult (maybe a little more difficult) than writing code. Don’t shy away from using professional writing tools even if they are not easy. Writing a book isn’t easy, it’ll drive you crazy. I promise.