Unless you develop code on an isolated island disconnected from the internet, it's a safe bet that you use open source in your development cycle somewhere. Maybe you use an open source IDE, or maybe you deploy to Linux; in 2012, it is likely that you have some bits in your process covered by an OSS license. Despite the prevalence of OSS software, most organizations tend to follow the "Make it Up as You Go" process for identifying the licenses they are incorporating and distributing at the tail end the software development process.
In this post, I outline the "Make it Up as You Go" process and talk about some of the pitfalls you'll encounter if this is a process you find yourself following. This is the case of a real company and how they went about assessing license exposure for a project that was already finished. Names have been omitted to protect the innocent. (Law and Order noise.)
Note: Many of the issues in this post could have been avoided with Nexus Professional 2.0's Repository Health Check. This post talks about what it is like to take on the responsibility for auditing licenses manually. Nexus 2.0 can automate this process and perform this check continuously as dependencies are consumed from remote repositories. We also go out of our way to get licensing information from more than just the POM. This is the unique advantage of using Nexus, we're not just looking at the POM.
Are you available to do a license audit?
That was the initial request. "But, I'm not a lawyer? Don't you want a lawyer to do that?", I asked.
The client, "Right, our 'lawyer' just told us to run an expensive IP scanning product, and charged us thousands of dollars to send us a recommendation email." "I think our engineers know more about OSS that he does, go figure. Maybe there are lawyers out there that know what the LGPLv3 does, but we've been unable to find them."
Me, "Ok, I can do a quick audit. I can make a list, but you need to know I'm not a lawyer. I'm much cheaper for starters. I 'know' OSS licenses, but if you ever go to court, I'm not your expert witness, agreed? I can't stand giving depositions. The last time I did this, I had to drive to New Jersey to give a four hour long deposition about some IP dispute."
Me, "If you want to get started, add my GitHub account to the repository. Do you use Maven?"
The client, "Of course we use Maven. And, we have a POM. This should be simple, we only have five dependencies."
"We only have five dependencies..."
While the POM had a collection of groupId, artifactId, and version coordinates. The developer that did the initial audit failed to appreciate exactly what happens when dependencies are "managed". It wasn't "five dependencies", those five dependencies pulled in another 20 dependencies from Central.
Lesson #1: Your build has more than just declared dependencies. Your dependencies also have dependencies. If you use Maven, run "mvn dependency:list". If you are performing a license audit, don't just look at the dependencies declared in your build, evaluate the full set of dependencies for potential licensing issues. Just because something like Spring is covered under the Apache License doesn't mean that it isn't depending on something covered by the GPL.
I delivered a preliminary report to the client.
I'm going to have to do some investigation...
The client, "What's this all about?" Me, "You've incorporated 25 dependencies into the build, when I license was in the POM, I listed it. There are a few dependencies that lack license information. I'm going to have to do some investigation on a few of these."
Lesson #2: Many artifacts in Central lack license information. Sometimes this isn't a problem: Apache Tomcat 5.5 has a pom.xml without license information, but since it is a part of the ASF, you can assume it is covered by the Apache License (or can you? see below.) On the other hand, if someone hands you a dependency on some esoteric bytecode manipulation library that is four years old, there's a good chance that the POM in Central won't provide any help.
Note: New artifacts in central are required to list specific license information. Over time, this will become less of an issue as these requirements are enforced.
What does investigation involve? For many of the projects this involves tracking down the project's web site and trying to put your eyes on a definitive license for a particular release. The project is using version 1.3.1 of a project: find a file in SCM or a web page that associates it with a license, print that page out, and make a record. More established projects have this visible, front and center on a web site with a big link that says "License". Other projects, particularly the smaller, hobby projects force you to look in SCM at a specific tag. Don't underestimate how much time it takes to track down answers here, you are scouring the web for information.
The best is when an "open source" project isn't covered by an open source license at all (and this happens). More common is that an open source project uses a custom license, or a standard license that isn't identified as one.
Lesson #3: You have to read custom licenses. (and remember, you are not a lawyer, so be careful.) The client was asking for a list of licenses, and most of the licenses were Apache, CDDL, etc, but every once in a while you'd see some small project with an uncategorized license. Take, for example, this license from Jline. while it appears to be a BSD-style license, you still have to read the thing. This is why open source projects should just reference an existing license.
Me, "Here's another version of the report, also here's an invoice....." The client, "Yikes, that's a lot of hours. (grumble)." Me, "You asked me to do this, it takes time because you are using all these crazy little projects. If you had stuck to Eclipse or Apache, this wouldn't be happening."
Lesson #4: You should stick with major forges when possible. Have a general policy to prefer artifacts from Eclipse, Apache, or companies with established OSS efforts like JBoss, Sonatype, or Google. When you don't have standards (or a review process), some developer can just add a new dependency with questionable IP and a custom license. Have some standards.
Trust but Verify
At the time of the review, the JBoss Netty project had just switched from the LGPL to the Apache License , but the project in question was referencing a SNAPSHOT version of 3.1.3 which was incorrectly labeled with the LGPL. It was confusing (and critical enough) to trigger a wholesale verification of all explicit license declarations in POMs. This effort uncovered a few inconsistencies.
Lesson #5: The license a project lists may not always be the effective license. This is usually the most eye opening revelation for companies that have adopted open source. When you consume an open source project and you see a license that says "Apache Software License, Version 2.0" you are taking it on faith that this project has some established legal process to make sure it is entitled to make such a statement.
The worst is when a project makes two contradictory statements: "Ok, the source code has LGPL headers but the POM says Apache License." What do you do when this happens? (Who knows? I'm not a lawyer.) Easy answer: choose the license you'd like it to be licensed under. Real answer: Avoid that component like the plague.
Here it is, did you know you are shipping GPL
Me, "Ok, the analysis is done. Your product is shipped to customers right? That's your business model."
The client, "Yeah, we're still considering all of the options."
Me, "Well if that's the case, you need to find a replacement for these three libraries, they are GPL. It is my understanding of the GPL that you will need to make source available for all of the proprietary bits in that product unless you find a replacement. I've listed some alternatives, they are covered by the Apache License which is very 'business-friendly'. The other alternative is to contact the company that ships that GPL library and get an OEM license."
The client, "Well that's not my interpretation of the GPL, and even if you are right, who is going to sue us? Everyone does this. It's too late to make a change like that."
Me, "Really? Are you a lawyer? I'm not, and, please remember what I said about me not being your expert witness."
Lesson #6: The are answers to "Who is going to sue us?" And, you don't want to find out what they are. Ignorance of the law is not a valid defense, and companies are getting smarter about open source. One of the issues with waiting to analyze OSS license exposure until the last minute is that it is very often unrealistic to throw it back to the engineers.
Catch these problems earlier with Nexus Professional 2.0
Figure this stuff out up front. Don't make it up as you go, and use a tool like Nexus Professional 2.0 to scan your repositories as your software is being developed. You can download a free trial and see the summary Repository Health Check to see summary statistics about the licenses you are using today.