The Cloud is Running toward BSD-style Licenses, are you?


September 24, 2012 By Tim O'Brien

The New York Times had a great article this weekend that explored some of the disconnect in the industry. In “Power, Pollution and the Internet”, James Glanz writes: “[the] foundation of the information industry is sharply at odds with its image of sleek efficiency and environmental friendliness.” This article is interesting in that it calls out the industry for creating an unsustainable power drain that is based on some awful environmental choices. From the article: “Of all the things the Internet was expected to become, it is safe to say that a seed for the proliferation of backup diesel generators was not one of them.”

This piece made me stop and think about trends over the last decade. While the New York Times is focused on the environmental cost, I’m more interested in how this shift to Infrastructure-as-a-Service and deployment on cloud-based infrastructure is affecting open source licenses. The trend might not be readily apparent if you don’t know what to pay attention to. Here’s an attempt of making sense of licensing trends…

Note: This article explores a trend toward BSD-style licenses. If you are interested in tracking your own application’s exposure to various OSS licenses, please take a look at Sonatype Insight. Using Insight, you can keep track of your application’s exposure to GPL, AGPL, and other licenses which may present problems when you have to worry about external or internal distribution. Make licensing a part of your Application Lifecycle with Insight.

Taking Databases as an Example

Done any serious web development over the past decade? You’ve likely encountered MySQL. MySQL’s popularity exploded as the industry was looking for a capable, general purpose database that could provide an alternative to Oracle. Oracle is prohibitively expensive for a large portion of the market, and if you are running a cash-strapped startup you likely won’t be eager to fork over the minimum six-figure price of entry you’ll need to run Oracle.

For a decade MySQL competed on both cost and capability. You can certainly scale it if you either know what you are doing or are comfortable spending money on Percona’s professional services. It has some scalability issues, but, for the most part, you can either shard or start offloading some of your data to NoSQL once you reach limits. MySQL was the capable database for the 00s, and MySQL rose to popularity over the last decade before people started moving to hosted infrastructure (now what people tend to call Cloud infrastructure).

Enter Postgresql (and the Cloud)

Well, something happened one or two years ago: a number of large, high-profile web sites moved to Postgresql. Now Postgresql has always had a reputation for being a database with a strong opinion. Database administrators, performance nuts, people focused on scalability have always gravitated toward Postgresql. Postgresql community is somewhat “conservative” and there’s a small group of core committers that tend to favor stability over creativity. MySQL, on the other hand, has always had a reputation for being something of a mess. Reliable colleagues tell me that MySQL codebase is full of shipwrecks and broken dreams, and if you’ve ever had to deal with some of the more finnicky parts of MySQL tuning you’ll understand that while there may be a science to MySQL tuning, it is well hidden underneath a deep layer of poor documentation and guesswork.

The commonly accepted reason for the shift to Postgresql was performance and scalability. While I don’t disagree that Postgresql is certainly easier to tune and scale than MySQL. I question this justification as being political rather than practical. This is the simply the justification you’d expect a technical audience to resonate with, but I don’t think it is the real reason for the shift. Here’s why?

Cloud-based Infrastructures Seek BSD-licenses

I was at a Postgresql event last week in Chicago it was really interesting. Postgresql is experiencing a Rennaisance of interest. More and more people are coming to the database and I was interested in why. It isn’t like I’ve seen several compelling pieces outlining reasons to stop, drop, and move to postgresql immediately. Instead it seems like a slow shift that has happened over multiple years. While MySQL was something of a default for startup developers in 2007 and 2008, Postgresql is that default now. I asked around and got the following guesses:

  • People have realized MySQL’s Limitations – I don’t buy this one. First, I do think that MySQL poses some tricky scalability issues, but I don’t think the majority of users create systems large enough to experience them. I don’t know anyone other than one or two individuals that has had a MySQL scalability issue they haven’t been able to either fix or workaround given the resources.
  • Oracle – I heard a lot of conspiracy theory about Oracle and MySQL. Lot’s of people put this out as a reason why there is a huge shift to Postgresql. I don’t buy it. Oracle is out there chasing after huge contracts. I don’t think the Oracle people lose a bit of sleep over MySQL, and (beyond some structural changes to the OSS project) I don’t think they are taking it away.
  • Avoiding NoSQL – This was a RDBMs conference so I took this with a grain of salt. A lot of people mentioned that Postgresql reduced the need to bring in technologies like MongoDB or Hadoop. I don’t buy that, I think that was just wishful thinking from a DBA that doesn’t want to integrate with NoSQL. I’ve also never spoken to anyone who said, “We’re on Postgresql so we don’t need to use Hadoop.” It just has never happened, and I just don’t see them as being in the same class.
  • A Cloud-friendly License – Now this I buy. This explains the trend. I think it would be over-simplistic to say that Heroku is behind a shift to Postgresql (but I do think it is a contributing factor). Companies that offer on-demand, PaaS-style services have an incentive to standardize on BSD-style licenses (like the one that covers Postgresql) because they are distributing software.

It’s the Licensing, Stupid.

If you look at the language of the GPL, and especially some of the purposeful FUD that pre-acquisition MySQL AB was throwing around, “distribution” of any kind was enough to cover your entire codebase under the GPL. I remember looking at the MySQL AB website in 2004 and wondering if it was even possible to make the explanation of the GPL license for MySQL any more confusing. At the time, the common wisdom was that MySQL was crafting the licensing explanation in such a way to give companies with any doubt the incentive to purchase (even if it stretched the definition of the GPL).

And, here’s the issue, I don’t want to single out Oracle, I think they are a fine company so don’t get me wrong. But, I do think that people are leery of distributing GPL projects with a single, strong copyright holder within the cloud on behalf of paying customers. Even though the license isn’t as toxic as the AGPL, it is still unclear what constitutes distribution. And here’s the central trend that I think we can call out. As more and more of us rely on third-parties (like Heroku) to download, distribute, and install software, these platforms are increasingly running toward licenses that don’t entangle them with a web of obligations.

Or, to summarize, no one likes distributing the GPL, even in the cloud, especially when the copyright is owned by a big corporation with an interest in license compliance.

So the next time someone tells you that they moved to Postgresql because it as faster and more scalable. Ask yourself whether this is the real underlying reason for the switch or if that person is just being caught up in a larger movement away from copy-left licenses for cloud-based, PaaS systems. Was it an original idea, or were they affected by early adopters of PaaS moving to Postgresql because that’s the only option that was provided.

Clarification: I can already see people bombarding me with this question: what about Linux, that’s GPL? My answer is nuanced: “I do think that people are leery of distributing GPL projects with a single, strong copyright holder within the cloud on behalf of paying customers.” The Debian project or the CentOS project is not going to go after you for internal distribution.

  • Dean Schulze

    I like Postgresql a lot but its JDBC driver has some weaknesses. The Postgresql JDBC driver doesn’t support its own UUID data type even though the java.util.UUID class exists in JavaSE. Maybe that shouldn’t be surprising since UUIDs are not supported in JDBC, but it’s disappointing since UUIDs are an important feature of Postgresql.

    I finally found the answer to supporting Postgresql UUIDs in Hibernate here:

    https://zorq.net/b/2012/04/21/switching-hibernates-uuid-type-mapping-per-database/