How not to download the Internet


April 19, 2011 By Tim O'Brien

A criticism I hear often about Maven is, “every time you run Maven, it downloads the internet.” I understand the criticism, as the first time you run Maven it has to populate your local repository. Maven downloads plugins and artifacts that your project depends on. Maven does in fact download artifacts from remote repositories, but it downloads the artifact once and keeps a local cache.

Maven only downloads most of these dependencies because you’ve added them to your project. If you are unhappy that Maven is “downloading the internet” then stop developing software that depends on external libraries. Easy, right? Stop using Spring and Hibernate, stop referencing the commons libraries, and do everything yourself. This would be one way to avoid downloading any artifacts from a remote repository. Stop using Maven to build your software and write your own build tool that has all of the capabilities of Maven and every imaginable Maven plugin baked into it.

Not a workable solution, right? The fact of the matter is that your software has dependencies on external libraries. If you find yourself constantly “downloading the internet” there’s a reason. You are depending on projects that depend on “the internet” or, your projects have a very wide set of dependencies that may need to be trimmed.

How can we avoid creating projects and POMs that “download the internet”? The simple answer is that everyone needs to start focusing on dependencies. Library developers need to be smarter about creating leaner, meaner dependency lists, and you need to start evaluating your own dependencies with an eye on efficiency.

Library Developers Need to Modularize

Take a project like Spring as an example. Spring’s libraries provide interoperability across a number of core enterprise APIs: JMS, JDBC, JTA. Spring also allows people to plug-in different implementations for various feature: Hibernate, ehcache, MyBatis, log4j, slf4j, etc. Picking on Spring in particular, artifacts from Spring tend to rely on the world. If you use some of the core Spring libraries you’ll soon realize that that one simple dependency XML snippet actually translates to 30 or 40 dependencies.

If you are creating a library (Spring, Guice, or Hibernate) you need to start thinking about the dependencies that you are selecting. Instead of just blindly adding in a dependency to ten artifacts, split up your projects so you don’t create that one, gigantic library which depends on the world. I’ll bring the conversation back to Spring. Spring is moving in the right direction, the most recent version of spring-core version 3 has five dependencies, while spring-core 2.5.6 has 13 dependencies. If you’ve watched the progression of the Spring libraries over time, you’ll notice a trend toward modularization. Take the Spring AWS as an example – there isn’t just one spring-aws library, there is a spring-aws-ant, spring-aws-ivy, spring-aws-maven library. As more people within the Springsource use Maven as a build tool, more people are starting to realize the value of having more projects with lighter POMs.

This matters because low-level, almost universal libraries like Spring, Hibernate, log4j, Guice, Commons libraries. These projects end up putting dependencies into everyone’s classpath. If developers of popular libraries get the message and move toward more modular project scopes then you shouldn’t see so much bloat in your own project’s classpaths.

Don’t just let any dependency into your project

What can you do? You need to have some standards. Don’t just let anybody put some new dependency into your POM. Have some process to evaluate and assess exactly what a new dependency is going to do to your classpath. Is that new-fangled database library going to drop a dependency bomb and pull in 20 other libraries, some of which have incompatible licenses?

One tool you can use to make this process easier is Nexus Professional. Nexus Professional has a new Maven Dependency report. It is very easy to use, find an artifact in Nexus Professional, and then select the Maven Dependencies tab. This report will allow you to see just how many dependencies a particular artifact is going to introduce into your project. It will also list artifacts that may be missing from the public repositories, giving you a chance to assess the quality of an artifact’s dependencies.

  • Nicolas

    Well the difference is that if you have 100 user using you project, there will be 100 users downloading maven plugins and libraries. In a company a repo server can help but need to have a machine for that and setting it up. If you are on open source project where everybody is on different place, a repository server doesn’t help.

    On the oposite side, if you just come with a “lib” directory in your distribution in your jar and include it in your ant build with one line of code, you have solved all your dependcies concerns and you no longer download the internet. This mean you can compile on computer that have no internet connexion and that you build is greatly simplified. Maven dependancies management is good for big complex projects, but really difficult to master. I’am faster and have less problem for small projects using ant than maven.

    • http://www.discursive.com Tim O’Brien

      It isn’t easier to maintain a “lib/” directory. It is orders of magnitude more difficult. But, don’t trust me on that, continue managing your own dependencies in a lib directory because it is better for “simple” projects. When your requirements start to evolve and your project starts to become more complex, have fun with your simpler build.

      This is exactly the sort of approach which leads to problems and a call to a consultant. I can’t tell you how many times I’ve heard clients say things like: “Our last build engineer was a very frustrating individual, he was convinced he was doing things the simpler way. Eventually it got to the point where we had to make a choice. We should have just started with Maven from the beginning, because now we don’t have time to fix problems.”

  • http://twitter.com/jmurphyuk James Murphy

    Tim,

    Great post. I’ve often heard “It downloads the Internet” as a typical (and primary) argument against using Maven. Personally I agree with you, it’s unfounded. The problem isn’t with Maven per-se it’s with developers being lazy when they work out what dependencies they need.

    As a Java developer myself I fully understand the frustration many developers feel when trying to work out dependencies. Typically they’ll want to just get as quickly as they can past “ClassNotFoundException” to get to something that’s of real value to their business.

    Hopefully if we all just show a bit more due diligence in our own projects we can work together to bring these dependencies down.

  • bartvdo

    Ehm how about changing maven to first check the local repository under m2 and only download when forced or dependencies are missing locally? There is probably a setting for this, but I can’t find it. It even tries to download stuff that is only available locally as I’m referencing other local projects…. The defaults are wrong, that’s the problem with maven.

    • http://www.discursive.com Tim O’Brien

      That’s exactly what Maven already does, and if you really want to make sure it doesn’t go over a network trying Googling for “Maven offline”.

      It isn’t that Maven has wrong defaults, it is that you have incorrect information.

      • bartvdo

        Maybe I should look better, but I get irked when I just installed a new version of a project and then do an install of a project that depends on it, it goes to look for it on all the repositories that I have defined. Even when it’s locally available (I just put it there!). It shouldn’t do this at all.

        I want it to go online only when it doesn’t exist locally. And never for any other reason unless forced/requested. This might be the default behaviour, but from my experience it doesn’t seem to be that way.

        My point is also that in my mind “downloading the internet” is a complaint because of this behaviour, seeing a whole lot of “downloading this” messages again even though you just made a build before with the same dependencies.

        I ‘m not the most experience maven user, so I might be doing something wrong. But an experienced coworker told me that he solved it by setting up a proxied maven repository… That seems wrong to me.

        • http://www.discursive.com Tim O’Brien

          If you are using SNAPSHOT dependencies, Maven is going to periodically check for updates. You want it to do this, if it didn’t do this then SNAPSHOT dependencies are useless. If it is still checking the remote repository, make sure you are running “mvn install”. If you want to be certain that it isn’t going remote run “mvn -o”.

  • Jan Kotek

    solution is simple: just dont use Maven. My project depends on 30+ jar files, all stored in subversion. I have document to keep track of jar files, also in subversion. I update jars manually once a year.

    It is amazing what people consider as ‘simple solution’, yet it needs 100+ pages of documentation :-)

    • http://www.discursive.com Tim O’Brien

      I love it when people stop by the site and tell me that Maven sucks because we’ve written a few, very popular books for the product. Thanks for the acknowledgement. Maven doesn’t need 100+ pages of documentation, it probably needs 20-30, but we’ve gone out of our way to support the community. Thanks Jan, thanks for the acknowledgement, I’m glad we went out of our way to help you out.

      Your “I update the JARs once a year” approach is a real winner BTW. Especially once you start to scale the effort up in terms of people and complexity. This very quickly leads to an explosion of responsibility. Need to upgrade to a new version of the Spring Framework? What do you end up doing? You have to walk the transitive dependencies yourself, and 9 times out of 10 you end up missing something essential.

      This is exactly the sort of thing that doesn’t work once you scale up to a larger workgroup. Your co-workers usually end up calling me a few months after you’ve moved on because the project became more complex and, all of a sudden, there’s no way to keep track of the dependencies, inter-dependencies throughout your project.

      • jankotek

        Tim, I believe maven2 works well for some set of problems, but it just does not fits my needs. I tried to use maven and amount of pages I read was way over 100. I had some specific demands, and I ended maintaining my own own repository. But it is probably way over usual use.

        Updating Spring? Download zip, verify hash, replace jar files, make note about new version and run unit tests. I would say 30 minutes. There may be linking problem with some outdated 3th library, but this kind of problems is easy to find. Also I usually update all jars at one go.

        And I need my project to be build-able even 10 years from now.

        • http://www.discursive.com Tim O’Brien

          Yeah Jan, I am very familiar with that method. We see a lot of it in our consulting group, because when it goes off the rails, no one knows how to sort it out. You don’t have that problem in Maven. A Maven POM always just works. You don’t need some build guru keeping the whole build propped up with wiki pages, popsicle sticks, and ducktape. I have seen some very impressive build processes. I, and everyone that works on Maven, just don’t think build processes need to be or should be impressive. They should just work. My boss is much happier when I work on software for our users, than when I work on documenting some outlier approach to the build for other members of my team.

  • Pablo

    I think that what annoys most people is that core plugins have to be downloaded from the internet. For instance, a simple “mvn clean” the first time you install Maven and you already have a lot of stuff which gets downloaded.

    Other build systems which are more “monolithic” like Gradle don’t do that. Only the declared dependencies are downloaded.

    • http://www.discursive.com Tim O’Brien

      Give Gradle a few years. Once it starts solving more complex problems, you are going to see the same approach. All of these build systems are deployed as “kernels”, if it were any other way, you’d bemoan the configuration burden.

      If you’d like to see the core plugins have less dependencies, I’d suggest signing up for the Maven Developer’s list and getting into the details. I’d certainly like to see people do one of two things:

      1. Help pare down dependencies within Maven, OR
      2. Propose that Maven ships with the Core plugins “baked-in”.

      • Pablo

        I think that the importance of the offline support has always been underestimated.
        There are some basic use cases which doesn’t work, I can give you an example:
        * Install Maven 3.0.3 (with an empty repositoy).
        * Download a Maven project (e.g. Apache Commons Lang 2.6).
        * Make sure that the dependencies will be available offline: mvn dependency:go-offline
        * Run “mvn clean package” to make sure that those commands will be available offline.
        * Disconnect from internet.
        * Run “mvn clean package”, it works.
        * Now you want to install instead of packaging, you run “mvn clean install” and you get the following error:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException

        What it means is that when I’m connected, I have to think about all the possible commands that I’ll want to run when I’ll be offline and execute them. Otherwise it will not work when I’m offline.

        If you could solve this problem, then people will have no legitimate reason to complain about “downloading the whole internet”. What annoys people is not downloading too many stuff, it’s about running a basic command that you never run before for a project and see new downloads of core plugins happening.

        • http://www.discursive.com Tim O’Brien

          This is a Red Herring, when you run the dependency plugin for the first time Maven has to retrieve the Dependency plugin. When you run the Install plugin for the first time Maven has to retrieve the Install plugin.

          This is how all of these tools work, plugins, tasks, whatever you want to call them are download once, as needed.

  • Ricky Clarkson

    Most of the time that maven appears to be downloading the internet, it’s doing so to get its own plugins, not my dependencies.

    • http://www.discursive.com Tim O’Brien

      I’d be happy to hear if you have an alternative solution to the problem. Please let us know.

  • Danny

    I don’t see any mention in this post or the comments about exclusions. When you include a dependency in maven, you can specify some of *its* dependencies to exclude. For example, large frameworks like spring have dependencies on libraries you *might* use, depending on which features of spring you’re using.

    • http://www.discursive.com Tim O’Brien

      Good point, exclusions are certainly a part of the solution. In general though I think the larger solution is to get more frameworks to start declaring dependencies as optional and start evaluating every dependency as being truly worth depending on.

  • andrew schneider

    maven is very frustrating sometimes and you have to read a lot of doc to getting it right
    But i can no longer imagine a java project not using maven

    • http://www.discursive.com Tim O’Brien

      Andrew, you would be surprised, but I share your frustration. I believe much of the problem is with plugin configuration. Several Maven plugins appear to have been written by someone with a love for complexity and confusion. I hope to get enough free cycles to help solve the problem. My first nomination for worst plugin award goes to the Release plugin.

  • Ronald van Loon

    I have just started using Maven. However, I am in an environment that does not have access to the internet on development machines and where every download needs to be vetted. What I would like to see for Maven for people like me is the set of core plugins (maybe everything under org/apache/maven2/plugins?) available as a separate .zip or other archive format – that way everything you would like to download for ‘core maven’ is installed. If you look at Ubuntu, they already do this – when you install maven, it downloads the packages considered ‘core maven’, their associated programs (when not already installed) such as ant etc. It leaves you with what could be considered a core repository that would provide a stable base for those using it. It is the trade-off between maximum flexibility with a small base bootstrapping kit and a minimum core platform that is known to work out of the box. Currently, I can do most of the stuff in the compile-phase, but none of the install stuff, as the install plugin isn’t downloaded etc.

    What also would be a big help is if there was a way that I could take a legacy in lib-installed jar file and track it back to a Maven dependency for a certain version, then press a button and it would present me with the portion of the download from the central repository, including its dependencies if so desired, to unzip in my local repository.

    If there is such a service already available (Maven already does most of this automatically, so it would be a matter of turning this into a Web-frontend), I would be grateful to hear about it.

    • http://www.discursive.com Tim O’Brien

      I wouldn’t call Ubuntu’s pakaging vetted as much as filtered through a pretty difficult process of repackaging dependencies. If you look at how the Debian Java packaging team works, you’ll see that you don’t really end up with Maven, but an odd interpretation of Maven that changes the way the repository works.

      A better solution in your environment would be to use something like Nexus’ Procurement feature. Using that you can vet every dependency and make sure that everything that comes into the organization has gone through an approval process.

      If you would like, we can look into distributing a version of Maven that bundles core plugins. I’ve always thought that this made sense, and I’ve been waiting for someone to take the initiative to do this for years.

      You can track a JAR back to central if you search by md5 and sha1 hash of the file. Look at mavencentral.sonatype.com and you should be able to locate jars by hash through that interface.