Interview Transcript: New Nexus Features + Future of Maven


March 4, 2009 By Tim O'Brien

For those of you who prefer the printed word over the audio
from Brian’s interview, this is the full transcript for Brian’s
interview about Nexus, Nexus Pro, and Maven from last week. In this
interview, Brian mentions Nexus Pro, to download a free evaluation of
Nexus Professional, go to
href=”http://www.sonatype.com/products/nexus”>the Nexus Product
Page
and click on Evaluate Nexus Professional.

TRANSCRIPT

nexus-ad

Tim O’Brien: Its been a few months since Nexus Pro was
released and its been about a month since Nexus Pro has been out
on the market?

Brian Fox: It’s been exactly a month since Nexus Pro went out. I think the Pro release has been pretty well received, the
features, the staging, promotion and ldap etc., all seem to be
hitting the way we want it to be, solving the problems the people were
having. Our main focus on the Pro release and the features was to
build the infrastructure, and to be able to, first of all, stop people
from having to manually move artifacts around. With the staging and
promotion that was happening, people had to, if they wanted to stage
them, they had to find a place to put them, test them, then manually
move the artifacts. Usually the metadata was not updated, there were
hashes, mismatches and things like that.

The same was true with promotions. When organizations had strict
policies about who could download things into their internal
repositories, usually they had a hosted instance of Central and
somebody would have to go manually do that and again the hashes
and mismatches and things were often corrupted.

The first version of the Pro was aimed at simply putting the features
together to stop that from happening and then we can build upon those
down the road with extended work flows and other automated
features. The feed back we’ve received so far is its exactly what
they need now, but everybody wants more.

TO: Tell me about some of the features that are planned for the next
version of Nexus Pro.

BF: The next version of nexus . . . The core functionality has quite
a bit of changes in it. That would be the 1.3 version. We’ve done a
lot of architectural upgrades in every version since 1.0, mostly
focusing at extending our plugin API to allow us to provide more and
more functionality.

The 1.2 Pro release, those features were all built as plugins to the
Nexus core. We’ve extended that even further in 1.3 to allow more
types of plugins in the security models specifically for external
realms has been significantly enhanced. This applies not only to the
Pro LDAP plugin, but also any other open source security
realms that people may create.

For example there is a crowd realm that Justin Edelson donated
and a couple of other ones that had been developed. One we developed
for Apache and a couple of others that people in the community are
working on. In addition to that, the core has a couple of pretty cool
new features. One of them is the ability to support mirrors of
repositories in an intelligent way. This is something that’s been
sort of a little bit of a problem with Maven because everybody
goes against the central repository even if there may be a mirror
closer to them because they want to make sure they have the latest and
greatest data all the time.

Only a very small subset of the repository changes over time. If we
get a few artifacts in there the chances are the mirror that is
closest to you has the right version of the things you need. What
we’ve done in Nexus is we’ve allowed you to define a canonical repository
URL, just like you would do currently. The repositories are able to
expose metadata, which is now on the central repository which
describes all of the known mirrors of it.

Nexus is able to leverage this and automatically populate a screen
that you can use to choose mirrors that you want to use. The way it
works is you can choose a mirror and we will use that mirror for
retrieving the artifacts. If the artifacts are not found on that
mirror then it will automatically fall back on the canonical URL. In
addition to that we use the canonical URL to retrieve all the hashes
so that you can be sure that if you get something from a mirror, if
the hash matches its the same file that was on the canonical
repository. This will allow people to more efficiently use local
mirrors without giving up the ability to get the latest and greatest
updates from Central and the security of knowing you don’t have any
artifacts that have been corrupted in the mirroring process.

That’s a pretty big feature in 1.3. Some other cool things that we’ve
done, the logs are now configurable via the UI; before you had to edit
the log4j settings. So now you can enable debug mode temporarily
and see things right through the online log viewer. That’s a favorite
of mine when I’m doing a lot of debugging.

The repository screens have all been combined into a single view with
tabs. Before we had a separate screen for browsing repositories and
that was separate from the one used to manage the repositories and
that was separate from the Pro version where you could see staging
repositories and stuff like that. All 3 screens have been merged into
one tab below based upon the privileges you have. And we think that
will make it a lot easier for people to figure out what they need.
The screens in the past looked very similar but the
functionality was different and it was confusing. So we fixed that.

In addition to that, underneath the hood we’ve done a lot of changes
to the way that groups are actually represented. In the past the
repository groups, which is a feature that lets you aggregate multiple
repositories together, that was implemented as sort of a level above
repositories. There was some problems with that because it was only a
logical router that spun through each of the repositories to find out
what it needed. It didn’t have the ability to store data directly.
That became a problem when we need to host things like the repository,
metadata or the indexes.

In 1.3, underneath the covers, groups are actually implemented as a
first class repository now. That means down the road we will be able
to change the UI and expose the ability to have groups of groups and
do a lot of other cool things that we have planned for 1.4. Right
now, that’s architecturally changed in the core, but the UI for the
most part looks the same today.

We have also made quite a bit of progress on the indexer, itself. So
there were a lot of bugs related to the searching where you couldn’t,
usually the search was giving you more results than you actually
wanted and that was due to the way the index server was tokenizg a lot
of the artifacts underneath — ID’s, groups and classes were all
tokenzd based on dashes and those things. We completely changed that
so that all that information is there but even more importantly we’ve
actually changed the download format. Now it’s a little bit smaller,
we use a better compression technique to compress that and it also
allows the download to be Lucene version neutral. Actually the zip file — the
luceen 2.3 index database zipped up. The tools don’t provide any way
to go back and forth. If you upgraded to 2.4, you would never be able
to publish the correct index for downstream users. So we created our
own binary format and the Nexus API jars are able to transparently
convert from the binary format to the correct Lucene format
internally. This gives us a great deal of flexibility to upgrade down
the road to upgrade to newer versions of Lucene to take advantage of
their features and things like that.

Also, coming very soon is the ability to have incremental index
downloads. So the index on Central for example will only be updated
once a day, but it will include only the things that have changed
since the last index so you won’t have to download a 30mb file every
week. You’ll just get a couple of “K” file every day. That will help
everybody. Nexus 1.3 will support that you just have to enable that
on the central repository to produce that.

TO: Speaking of the index, could you talk about some of the problems that
were happening and give a little update on the language being used by
Central at this point?

BF: That’s a good question, I actually don’t know anymore because its
been so low, I haven’t had to pay much attention to. Around
Thanksgiving we had some pretty serious problems with the Central
repository, which is hosted off of a 100mb connection. Initially on a
regular basis, the Apache HTPD was running out of workers threads. As
we kept increasing that, the load in the machine was getting through
the roof to a point of where it was about 200 on a 5-10 minute average
on the red tab machine there. We actually switched over to NGINX, which is a highly opthamized, a form of HTPD, and that brought
the load down to less than .5, which was amazing. I thought it was
broken when we first did that, because it was so low and didn’t seem
to be doing anything. Then we started saturating 100mb connection.

So it took us a while to figure it out some of the tools out there
were misbehaving and actually downloading the index file which is by
and large the largest file in Central and in the most frequently
accessed one we found some locations were downloading 30mb every 2
minutes because of that broken tool. We worked with those people and
got that fixed up, but we were still having regular spikes every
Monday when everyone came in and downloaded the new index.

We now host the index off of Amazon S3. That leaves the rest of Central
basically running somewhere 10mb per second average as opposed to 99mb
per second average. And the S3 stuff is up there in the clouds and it
makes even faster downloads for everybody else. Its a win-win in that
situation. We were able to turn our attention back to Maven itself
instead of trying to focus on what is wrong with Central.

TO: Speaking of Maven, 2.0.10 was released. I’ve looked
at the release — it took a number of months to get from 2.0.9 to 2.0.10.
What exactly was in this latest release?

BF: Unfortunately it was 10 months, believe it or not. I didn’t
notice that until I went to release/upgrade the website. Once it
upgrade the website, I noticed 209. There was a lot of work done this
summer and we went through some release candidate process. We had
somewhere around 10-15 release candidates. We decided that some of
the changes that we made to fix bugs in there we felt they were
important, they made things more deterministic in the build pattern
and fourth life cycles. We felt that they may be a little risky to
introduce into the 2.0 stream because we were really focused on
stabilizing that in the 2.0.8, 2.0.9, 2.0.10 releases.

We decided to make a new branch and that became the new 2.1.0 milestone
1 release. It took a little bit of time to sort that out and we
ported the bug fixes back into the 2.0 branch and pulled the features
out of there. That took a little bit of time to get that process
going again with 2.0.10. Basically 2.0.10 just has a pile of bug fixes in
it and not really any interesting features. But that’s the point that
we can make this thing more stable as we go forward. Hopefully 2.0.10
release will be the last release of the 2.0.x line. There is always the
possibility that we may fix a few bugs in regression, but we don’t
really want to put any energies there anymore.

The 2.1.0 milestone 1 release turned out to be very stable because it
went through around 2.0+ release candidates before it finally was
released. The problem is in talking to customers, we found out that
many people were either afraid to or just simply not allowed to use
that milestone release just because it had an M1 at the end of it.
All of us, Maven developers and other people that have used it, know
that’s it is very stable, maybe even moreso than the 2.0.x timeline.
The plan now is to get all the bugs fixed in there we feel are really
important. We’ve pushed all the features out that we originally
planned for further milestones since the 2.2 and hopefully we’ll start
staging release candidates of 2.1.0 within the next couple of days to a
week tops, if we can get a 2.1.0 release out and not that lets us stablize
2.0.10 then hopefully we won’t have a need to do a 2.0.1.1.

TO: Talk to somebody who might not be following you very closely.
What’s the difference between 2.0.10 and 2.1.0?

BF: The 2.1.0 has a new feature in it that Olege had actually coded
and it took us a while to remember it and get it integrated. And
that’s the ability to encrypt your passwords in your settings files.
That’s actually a pretty commonly requested feature that people don’t
like putting their password particularly if they are using a
repository with their corporate password, they don’t like putting that
in a text file for obvious reasons. One significant feature. In my
eye, in 2.1.0 is the ability to encrypt that password.

The other significant feature is, I guess I wouldn’t call it a
feature, its basically some bug fixes that I mentioned that we tried
to put in 2.0.10 that had to do with fourth life cycles and making sure
that properties were correctly interpolated. The major use case that
came forward with that was clover users that they needed to instrument
the jars and needed to update certain paths. But the way they made it
in 2.0.x viewing with that it didn’t really work right. That will be
fixed in 2.1.0.

TO: The main feature that will be most visible is the password
encryption. I think fourth life cycle having had to write about them,
they pretty confusing. It’s almost like time travel there.

BF: Yeah, I think so. And that’s really a bug fix that sort of
oscillated back and forth in various versions of 2.0 that we fixed it
one way broke other cases, then we put it back and it broke other
cases. That’s why we decided to bump it to 2.1 and fix it right. I
think the security and encryption of the passwords will probably be
the headlining feature.

We’re also working to get on the patch contributed by Don Brown to
have the parallel downloads of artifacts. That can actually improve
download performance pretty significantly by my own testing, it even
is faster even if you’re using a repository manager. That’s also
going to be in the 2.1.0 release.

TO: That was from Don Brown’s Patch. It’s about a year old, right.
Didn’t he fork Maven and try to do some things on his own?

BF: Yeah he did. He forked it and applied this patch and sent it back
to us. It didn’t make 2.0.10 at that time was because we were waiting
on integration tests and other unit tests we never really got. John
is working on it now to try to get some specific tests on that to make
sure it works. The main thing we’re adding to that is the ability to
basically turn it off. The original patch was simply on and if we
released it and the bugs turned up, there would be nothing you could
do about it. I think that’s what John is working on now to make the
number of parallel downloads configurable and you can put it back down
to 1.

TO: It sounds like you made a transition from a project that
cavailerly released a maintenance, to one that is thinking about a million user
installed base or even more than that, not sure of the numbers.

BF: Yeah, that sort of happened after I started doing the 2.0.x
releases. It became apparent to me that we were just introducing half
the bugs on any given version, were regressions from previous versions
or ones before. It was a little bit ridiculous and embarrassing and
that is when we really started focusing on getting the release
candidates and bringing the release candidates out to the user base
and not just keeping them inside the development community of Maven.
It simply was not enough use cases for us to ensure we didn’t break
something.

When we first started doing that, we found all kinds of stuff that we
had gone through many iterations internally and everybody thought it
was fine. As soon as we went to the user community we found out
quickly it wasn’t so fine. I think that the stability of 2.0.9 has
shown us that process actually worth maintaining because its been
almost 10 months since the 2.0.9 release, but its just basically works.
Hopefully the 2.0.10 release will build upon that and continue to be
stable going forward. We’ll do the same with 2.1.0 and with 3.0.x that
Jason is releasing.

The goal is every 2 week for releases. Its been more like every
month. We’re almost ready for the Alpha 3 release. Once we get
through some number of alphas’, we should hopefully have a stable
product before we call it final.

TO: Just a brief list, give 4-5 things we can expect in the 3.0 trunk.

BF: Is it possible to sum up 3.0 in this short list?

TO: I mean it seems like probably at least a few months off — its not
a year, what is the time frame, what is the plan?

BF: I think its a couple of months before we start having public
betas. I think maybe 6 months before we can realistically think it’s
final. But all the pieces are there and so we’re just sort of working
through them as we go forward.

The major change — there are several in the 3.0 line. One of them is
that its basically set up for embedding now. In the 2.0.x line, there
was an embedder that basically stopped at 2.0.4. This is actually what
things like Hudson used to embed their maven functionality. The
problem with that was it was sort of added to Maven after the fact.
Maven 30 basically, the command line client is a client of the
embedder. The core of Maven is an embeddable component and a command
line wrapper around it. This is what’s net beams, eclipse they use to
get their Maven functionality. That’s a pretty significant change to
focus on making it embeddable.

The dependency resolution has been completely redone using the new
mercury stuff which allows parallel downloads and it uses the SAT4J solver which I think came from OSGI land and Eclipse. The goal
there is to be able to have more deterministic dependency resolution
including good range support. Ranges don’t really work so well on the
Maven 20 branches.

TO: I know. I think a blog was done some time ago about that. It was
a big picture and all it did was sort of made me remember some very
difficult math classes in college. What does the SAT4J
thing actually do?

BF: Its trying to solve differential equations, I guess. There’s many
different possible combinations that can result in a solution when
you’re talking about ranges and orders of dependencies and conflicts
and things like that. The SAT4J attempts to resolve that
down to a single answer all the time. The same answer is really the
goal.

TO: Its just trying to make the process deterministic so that nobody
runs into weird problems where the order of dependencies was the
actual problem of the solution is to the versions?

BF: That’s right. Its a complicated problem space to deal with. It
also has the ability to download things in parallel just like the
patch that we have in Maven 2.1, but basically the way mercury works it
is able to go out and figure out all the things it needs to download.
It makes the decisions about the artifacts it needs before it starts
downloading them. It’s then able to hand off to a mercury client that
has a lot of J coded into it that goes out and downloads all the
different things from the different repositories all in one shot. The
old code would download things one at a time and make decisions on the
fly. Sometimes it downloaded POMs in jars that it ultimately did not
use. It was inefficient.

The POM inheritance and interpolation module, the project builder we
call it internally, has been completed rewritten by Shane. The focus
on that was to first of all fix a lot of the problems that we had,
figure out the rules because it was never well documented how the
inheritance and the interpolation was supposed to work. That’s more
the rule base approach and its also set up to allow the injection of
the model from any type of data format. Not just upon that X amount
but for things like N maven, for example, to be able to leverage the
core maven functionality even though they may not have a POM. That
would also be used by Tyco which allows eclipse applications builds
within maven from the eclipse metadata and not a POM. That’s been
completely rewritten.

Off the top of my head those are the three major changes to Maven 3.

TO: The changes are big. . . sort of talking about the future of
Maven. There was an old Star Trek show once where there was a problem
with the warp drive and they ended up in the far reaches of
the universe with totally new forms of matter — that makes me think
about that — what’s down the road in the Maven Project without a
project pbject model.

BF: Yeah, in theory you should be able to write the project metadata into
Maven and it will just work.

TO: Also just last night I saw Charles Nutter, the creator of Jruby
was twittering about some sort of Ruby Maven Bridge he made and Jason
chimed in and said we should make that a part of Nexus. Lots of good
interesting stuff happening.

Thanks for taking the time to talk to me and we’ll check in in another
few months.