Wicked Good Development Episode 3: A look at the past, present, and future of Maven Central

March 24, 2022 By Kadi Grigg

30 minute read time

Wicked Good Development is dedicated to the future of open source. This space is to learn about the latest in the developer community and talk shop with open source software innovators and experts in the industry.

If you utilize Java or any other JVM, there's a good chance you know the Maven Central repository. Today's episode brings long-time maintainers and contributors of Maven Central, Brian Fox, Jason Swank, and Joel Orlina to the mic to rehash the early days of Maven Central, lessons learned from managing open source ecosystems, and bring insight into the platform's practical software supply chain management capabilities of the past, present, and future.

Listen to the episode


 

Wicked Good Development is available wherever you find your podcasts. Visit our page on Spotify's anchor.fm.

Show notes

Guests

  • Brian Fox, CTO and Co-Founder, Sonatype 
  • Joel Orlina, Engineering Manager, Maven Central
  • Jason Swank, Director, Engineering, Maven Central 

Hosts

  • Kadi Grigg
  • Omar Torres 

Topics discussed

Maven Central, open source repositories, Apache Maven, Java.

References

  1. Central grows up, see their history- https://blog.sonatype.com/2011/07/central-grows-up-see-the-history/
  2. Why namespacing matters in public open source repositories- https://blog.sonatype.com/why-namespacing-matters-in-public-open-source-repositories

Transcript

 

Kadi Grigg

Hi, my name is Kadi Grigg and welcome to today's episode of Wicked Good Development. This is a space to learn about the latest in the developer community and talk shop with OSS, innovators and experts in the industry. Today we have an amazing team from Sona type, including Brian Fox, co-founder and CTO, Jason Swank, Director of Engineering, Joel Orlina Engineering Manager for Maven Central. Thank you guys for being on today. Yeah, thanks for having me.

So today, we're all here to talk about Central right. And I wanted to just get a little bit of background as to what lens you're bringing to today's conversation on Central.

Brian Fox

First, hopefully, the audience knows me, Brian Fox, co-founder, CTO, you know, way back in the beginning of the founding of the company, you know, we were running the central repository.

I have lots of late night horror stories that are old, Joel, I'm sure has more recent ones, maybe we'll touch on those. Um, so. So my context is the long view context of Central and all those things going on. And I thought it would be interesting to give the audience sort of a look under the covers of what's going on with the operations team of Central some of the challenges we've had and some of the plans that are coming up. Jason, you want to give an intro?

Jason Swank

Yeah, I mean, I'm not supposed to assume that people know what Maven Central is. But it's, it is essentially the the primary sort of software registry and repository for JVM languages, right. So Java, most notably, but Scala and others as well. So it's basically everyone distributes open source components for those ecosystems, right.

Yeah, so it's a bit background, what what Maven Central is, you know, they're equivalent in other language ecosystems as well for just like NPM, or pi pi, or whatever, Maven Central is that for Java and other JVM languages? Alright, Joe?

Joel Orlina

Yeah. And, you know, Brian, you allude to this, he was, like I've been, you know, it sounds like a long time as well, and kind of inherited a lot of the care and feeding of Sona type from you. So I'm still very much involved in the day to day. And if we get into some horror stories, I do have recent ones. But you know, I think I have ones from from the early days as well. And, you know, I think there's a lot to share about how central, you know, in the present day doesn't just serve, you know, one, you know, ecosystem anymore, I think, you know, we are a giant repository of open source software for multiple languages that target the JVM, and part of our future direction, you know, builds on, you know, this, this really giant legacy we've had serving the Java community for a long time. And so I, you know, hope to provide some interesting context.

Kadi Grigg

Today, we're gonna switch things up a little bit, given Brian's long standing history, dealing with Central, I thought it would be best that he take over as emcee today for this discussion.

Brian Fox

All right. Thanks, Kadi. Oh, yeah, I get to ask all the questions, even though maybe I know the answers, but we'll see.

Yeah, so Jason, I think it'd be good to give a bit more context around Central, maybe some of the history maybe so I can, I could jump in with that, you know, in the, in the early days, Maven, the Apache Maven was the build system for, for Java. And it was really one of the first systems that promoted binary reusability. And also, sort of convention over configuration and dependency management in the introduction of what is called the POM or the project object model, which captures a lot of the dependencies. And so this is done to allow the build system to figure out when you say I need a component, and that component says I need these other 10 components that it understands that that relationship exists and is able to fetch those sort of recursively.

In the early days, components that everybody was, depending on upon didn't they weren't built with Maven. And so none of that metadata existed. And so the community had to create a place to basically share the metadata about components that already existed. And that kind of was the genesis of what was called Maven Central. And so if you wanted to use Maven, and you needed to fetch the dependencies, you hoped a lot that somebody before you came along and figured out what all the dependencies were and produced the POM you know, over time, as more and more projects, first open source and commercial project started building with Maven and using Maven to publish their stuff. Maven Central became sort of the the primary place where a lot of those open source binaries and the and the associated metadata started to become distributed. Right. And this was back in the 2003 2004 timeframe. Over time, other other build systems in Java came up, you know, there was Gradle, and there were

Ivy and then other JVM build tools like scaler SPT and and I'm sure there's others that I'm forgetting a lot. Yeah, there's a lot of them, kind of kind of what happened was that metadata and the Maven distribution, the format of it became kind of the standard that all the tools interoperate with. Right. And I think part of that was because of the ease availability through central like, if you wanted easy access to basically all the open source Java and you wanted to publish your stuff, for easy consumption, everybody else, you were doing so through that, that conversion layer, which was a Maven format, and Maven Central, and and that was kind of the genesis of it, and it's grown. You know, the the consumption statistics are kind of mind boggling that it every year, it looks like an upward you know, hockey stick curve, on the growth. You know, Joel, I think I asked you recently for the the most recent number, I don't know if you have that handy, what the what the download stats were last year.

Joel Orlina

I can look it up while you continue.

Brian Fox

Put you on the spot? Sure. Yeah. It's a lot, right. I remember to I remember when it was 500 million downloads, we thought it was a lot. And and you could put a copy of everything on Maven Central on a 20 gig hard drive right now. Now, it's, it's much larger than that.

Jason, we were talking earlier, you know, about how, how even the consumption has gone so far beyond Maven, and I think you had some of the statistics to come.

Jason Swank

Yeah. A couple of things. Yeah. Building on what you just said, though, you know, I mean, Maven Central has a long history, right, it predates a lot of sort of modern software registries. And I think, you know, they look at other ecosystems, like newer languages, like Rust has great SEO, right. And some of their approaches are really novel and, and solve kind of current problems. I think a lot of cases that Maven POM is almost like, um, like a subset of what other build tools are bringing to the table, like I live with the Gradle guys do with their modules where it's a more extensive set of metadata or whatever. So it's kind of like, that's the foundational building block that everyone's had. But I think other build systems are building on that as well. And yeah, and that's, that's some of the stuff we're working on as well, from infrastructure perspective, right. In terms of consumption?

Yeah, I mean, I know maybe. So. I use these terms a little loosely, there's a paper that Sam Boyer did run the Go ecosystem where he sort of find what are the software registry and stuff repository? What are the functions that filling with that? And then what are the functions of like a PDM, a program dependency manager, of which Maven is one, right? So he was doing a lot of research, we want to go ecosystem was trying to address these challenges and wrote a great paper around that encourage everyone to read. So Maven is one of many PDFs, right. Gradle is another one you mentioned. So in Maven downloads are somewhere around half, you know, depends month to month, sometimes it's 48%. Sometimes it's 50.1%, whatever, of that activity Gradle was his increasingly large number, I know in terms of publications on a monthly basis, something like 20% of anything posted Maven Central is is is coming from Scala, right, which is SPT I want to say right. So, so that's in a different ecosystem. Your Gradle is also way up there in terms of its usage. I don't have an exact number, but it but it's, but it's somewhere around 40 40% of what we're seeing on a monthly basis, is my understanding.

So So yeah, it if you think about PDMS versus a software register or repository, and I mentioned to to Katie and Omar earlier, right, that, you know, the Hey, there's there's NPM. NPM has one PDM. Basically, I mean, there's a couple smaller ones because NPM as the client, right? In Python are using, you know, standard tooling for their it's a single tool Ruby gems, you use Ruby, Ruby gems is a tool. With with the Java JVM, there's multiple tools people are using a lot of times, they're custom making tools, right for specific use cases as well. So there's a Maven is by far the largest, you know, around 50% say, and there's a long tail of like 300 other clients hitting Maven Central in a month, including including fun stories and of direct usage.

Brian Fox

And one that comes to mind is Minecraft.

Jason Swank

And that's when that's what my son realized my job might be important is when we saved Minecraft. Ish, you know, the module system? Minecraft and yeah, right.

Joel Orlina

We broke Mincraft then saved it. They essentially doubled our bill. We noticed that during the day that you one of the more popular forge launchers was essentially re downloading the same set of files they could just catch every time somebody started it up. And you know, this was, you know, early on, you know, sir, my tenure living at Maven Central, I feel it was just a couple of years in and, you know, it took a while just for us to figure out where that activity was coming from. But you know, when you have this free public repository of code people find novel ways to use it often not the most intelligent ways. But, you know, eventually we got down to the root of it and restored service to Minecraft, right. For a few hours people couldn't play.

Brian Fox

I think there was a similar case, right? What wasn't it Amazon images or something every time they stood up? They were provisioning some stuff directly from Central.

Joel Orlina

Yeah. And but I think we were able, you know, I think that we absorbed that cost in that load, you know, in a more seamless fashion. But, yeah, I think that, you know, we are barely able to sort of stay ahead of the novel ways people find to use this. And I think that, you know, with the advent of cloud providers, like AWS, you know, Azure or GCP, you have people able to stand up build farms, you know, hundreds of 1000s of compute, you know, instances at a time, and, you know, Maven as a resource, or central as a resource is, you know, it's a way for me to get all of them bootstrapped and running the same code, you know, the exact same way. And I'm not surprised at all that, you know, that this is, you know, the way we're going, and I think, you know, it's a, it's been quite a relief that we've been able to stand up to that load. But yeah, this is the way things are now.

And I have some, you know, brief numbers here on volume, I think,

Jason Swank

Yeah, I do, too. I wrote these down. Yeah.

Joel Orlina

Yeah. You know, I will, let's see, is it, am I reading this right, 496 billion hits from all of 2021.

Jason Swank

I broke this down on weekly basis. So you know, we have like, you know, 100 new publishers a week, you know, about 50,000 new components a week, we serve up about 1.3 petabytes of data a week, and we're serving about 11 billion requests, right. So, yeah, a week. Yeah. And so

Brian Fox

Yeah, go ahead.

Jason Swank

Yeah. I mean, I mean, someone mentioned me that, you know, Central is probably the most important, quote, unquote, product. So as I remember, back in the days, when we're looking at

the DR kind of capabilities, and HA type things, right, and we're saying, hey, well, here's this critical thing for our customers, you know, that HDS, or this thing or whatever, like, well, that has to be available, you know, we can't have an outage of more than an hour or something like that. And Central was always like, seconds, it's kind of like, you know, it's kind of like the plumbing, you know, in your house, it's like, you don't really notice it's there until it's not there. And then, you know, Twitter lights up, and it's a mess. So it's always been like, the most important thing we're running. I think, fror an operational perspective,

Brian Fox

Maybe in the show notes, we could dig up the old blog post I wrote, but that for a long time, Joel remembers this, it was literally running on a single 2u blade server activity objects. And, and eventually, we had a cold standby. failover. And then we had we started moving, you know, we had a machine in the UK, I think it was before, before the advent of, of CDN (Content Delivery Network), or certainly CDN’s that we could afford anyway.

And, and, you know, surprisingly, that that one system basically never went down. It was kind of shocking, how reliable that that system was, but certainly that that one system couldn't deal with the load today, let alone the world requires.

So on the on the bandwidth and some of the the novel use cases last year, I think, was sort of an interesting watershed moment for us right early here.

Jason Swank

That’s a great term. Do you know where that term came from? Brian, you know, watershed, you so this, you may cut this out if you'd like, but like a watershed is like there's a before and after, right, you know, watershed moment, you know, I know you're gonna say Log4J watershed moment, right? It's like there's a before log for chain after log for J Now there's White House meetings about the software supply chain, etc. We've got a watershed in the US watershed is like, you know, where do my streets drink to which Creek which river etc. It's an area of land, right. But so where's this before and after come from? And it comes from Britain, right. So the watershed in Britain is like the Ridgeline, you know, the hill, you know, it's the waterfalls, and it goes one direction or the other. There's a before and after, right? And so I think luck for Jay is like a watershed moment for maybe the industry as a whole, right? But for us, it feels like, Well, this has been flooding for a while, and you're just seeing go downstream. And we've been talking on the software supply chain stuff with Equifax and you know, like, like, for me, I’ve been here for 10 years, and we talk about software supply chain issues, or whatever. I feel like we're seeing this pressure. And now it's kind of button to the dams. And now it's a watershed moment, but it's a watershed moment in that maybe in American sense, you know, for us, and maybe in Texas in that..

Brian Fox

That’s fair.I was actually going to take us in a different different watershed, which was the Bintray announcement.

Jason Swank

Sure. Yeah. heads in a different space.

Brian Fox

Yeah, we've talked about Log4J at nauseum in the previous Yeah. Right. So so if you're, if you're interested in what I and others, check out the first couple of fun, yes, we flogged that debt that the death and waterside style as of earlier this morning, 42% of the downloads are still of the vulnerable components. We're going backwards, not forward. So that's enough on Log4J at the moment, Bintray.

Joel, do you want to talk about that?

Joel Orlina

Yeah, in so, you know, I feel like a lot of the success we had in recovering from the Log4J incident at the end of 2021 was fueled by, you know, Bintray. And specifically, you know, their JCenter, you know, sub product announcing a shutdown in the beginning of 2021.

They had marketed themselves as an alternative repo, a superset of various Open Source Repositories, and have built up a fairly large community. But, you know, their parent company decided that running, it was not, you know, in their long term plans and gave initially, I think, was a fairly short amount of notice that they would be sunsetting, the service leaving these publishers without a home.

And we, you know, we have always been aware of JCenter's presence, to publish to JCenter, you can actually still publish to Maven Central, and you actually use the Sona type mechanisms for doing so. And so our response to that announcement was to reach out to the community and let people know, well, you can publish directly to Sonatype, you can publish directly to central via Sonatype services, and we embarked on you know, not just the communications, but also providing support resources and humans on our, on our team to offer migration of existing products. We have projects from JCenter hosting, directly to hosting on Maven Central, by, you know, signing up with our services and following our, you know, mechanisms for publication.

Brian Fox

I think it fits the definition of unscheduled unplanned work. We didn't know that was coming until we found out about it the same as the rest of the world. Right. And so it created some challenges for your team in the, the flood of new projects in a panic trying to onboard and and I think also trying to go and migrate their components that were only on bintray.And dealing with the the namespace collisions and all the fun stuff, the difference in the validation rules that we're using, right?

Jason Swank

Yeah.

Joel Orlina

One of the key pieces to our publishing is our, you know, validation of ownership of, you know, of just the top level namespace, right, you know, we call it a group ID. And, you know, we, we expect it to, you know, be a reflection of, you know, a domain you you own, and, you know, we've we've enforced that, you know, you know, convention, you know, for years, and a lot of the the JCenter rules really allowed people, you know, a lot more freedom in terms of signing up for name namespaces. And it was, you know, a bit of a shock to some publishers, but, you know, we explained to them that this is, you know, part of the responsibility you have to the community, when you sign publish open source, you represent yourself as a stable organization is one who's committed to maintaining components in the long run. And I think that, you know, we were very successful in, you know, sharing, you know, the reasons for our requirements. And I think, you know, we I don't have numbers, I don't know what the scope was of people who were marooned by the BinTray shutdown. But, you know, I think just anecdotally, you know, we were able to execute the migration, get new spaces, namespaces spun up for them, and in cases where people, you know, couldn't prove ownership, you know, we help them choose a different namespace, and the strategy, you know, helped with their strategies for bringing communicating outwards, you know, to people to move over.

Brian Fox

It definitely was a SWAT team moment for your team. And I think looking back on it, you know, necessity being the mother of all invention created a whole bunch of new automation on our side as well, right to rapidly deal with with with that onboarding. Do you know, offhand, that there was a step function in the bandwidth that kind of happened as a result? What did that look like?

Joel Orlina

Oh, I have the spreadsheet where I think it was, we were actually afraid that the step would persist. But it was a fairly large jump that first month, February or March, and so I'm going to bring that up shortly.

Jason Swank

Yes, I think the big the big change in our collection was in the publishing activity. Right. that we had on board a lot of new Yeah, I said 150 people a week, publishers a week we do now, but we had in a very short order, you know, on the order of 1000, or something publishers really.

Joel Orlina

You know, to Brian's point, we saw that activity, you know, reflected, you know, strangely enough in people downloading and, you know, I think that publishers need to be sure that their stuff made it out safely. So yeah, it was a 22% increase in bandwidth and a 20% increase in the number of requests. Just by comparison, we see something on here a one to 2% increase month over month. So we stepped up to that 22%. And then we had another bump in right before the summer around around the 5%-6% range. But then we've sort of, you know, sort of settled back down into our standard, you know, central growth, you know,

Brian Fox

Thinking about the validation and some of the namespace types of things. You know, Ax, one of our podcast guests and security researchers wrote a blog recently talking about dependency confusion last year, which kind of happened almost around the same time as Bintray. As I recall, they were within the same quarter, at least,where there was a novel type of attack, if you will, of people pushing fake things to the repositories, and then many of them turned out eventually to be malicious. And he added them up and there was about 63,000, that Sonatype has detected and reported back to various repositories in different ecosystems. How many of those were on Maven Central?

Jason Swank

You know, the answer, I know you know the answer Joel. So. You haven't heard about this? Yeah, none.

Joel Orlina

Yeah. I mean, I was not asked

Brian Fox

That was a trick question. That was a trick question Joel. The answer is none.

And it did. Do you want to speculate why? You know the answer to that too. Our audience may not know.

Joel Orlina

I won't, I won't speculate. I’m just, I just turn the knobs and I keep the hamster wheel going.

Brian Fox

Well, it has to do with the validation accurate. Yeah. Right. I mean, in in ecosystems, where anybody can publish anything. You know, if you have a project, that's called My underscore project, and there's no namespace, there's no in some of these other ecosystems, there's nothing that indicates that this comes from Apache or from Eclipse, or from Sonatype, it's just my project, or somebody else can rock up and publish my dash project, my underscore project. And when you're a consumer, especially if the downloads have been faked, you know, we've seen scripts/ bots driving up download. So it looks like the most popular version of my something project. It's easy to confuse people about what they're downloading. And so then the goal is to get them to download it. You know, it's, it's like typo squatting in domains, you pick something that's close, and hope people land on your site and think it's the real one. That's what's, that's what this attack is all about. Because Maven and because Maven Central enforces the rules, that part of the coordinate system, as you indicated earlier, is required to be a domain name, a reverse domain name, which is common in class names and Java. And we require that you publish under a domain that you control. Or in the case of a GitHub project, you can use a What is it github.io. So your project name.github.io is the coordinate that you use, and we validate against that, right. And so that level of namespacing and validation, has largely sidestepped this whole class of attack. And again, 63,000 and counting in the last year have have occurred in our ecosystem.

Jason Swank

There's other things I've kind of been working on kind of new development around Maven Central that I've seen is that the other thing I think Maven Central has gotten right is it's not just this namespace, right. And it's not just a name and a version number that has to be unique. It's some some ecosystem, that's it, you know. You have to have a description, you have to have license information, you have to have developer information, you have to have SCM, URLs, right? There's a whole list of requirements, which seem burdensome, right? If you're just a developer, trying to fast track something, why am I entering all this stuff? But I mean, like, this is really important. I mean, there's sort of like the digital good of the, you know, here's the thing I'm distributing, but then there's all this metadata around it, that's actually critically important for kind of supply chain stuff, too, not to beat the Log4Jhorse. But like, you know, the fact that we know what the licenses were or what are the developers that had access to this? Or how do you get support for it, it would require that information and publication is a huge benefit to the ecosystem, right? To be able to kind of look all the way down that whole transitive dependency list and know where all that stuff came from, you know, know who the developer was behind. Right? And I don't think you can do that with most of the ecosystems the way they're designed by default. You're not forced to.

Brian Fox

That's right, you know, and the standards and the requirements, the bar, if you will, for publishing to Central has been that high for a very long time. Over decades, we've taken a lot of heat from that. People said, can't you just make it easy? I just, I don't want to do this stuff. I just want to throw my thing out there for distribution, you know, and I'm fond of saying,” Well, if you don't care enough to provide this better, this basic metadata that why should anybody care about to use your stuff?” I mean, come on. It's basic.

So I think this is a good segue, you know, Jason, you you've been kind of working with the team, really to do more than just ops, but to really put some development capability behind a lot of the the trappings behind Central and some of this authentication and other things are part of that. Do you want to talk about that? I mean, we blogged about it, you know, a couple weeks ago, but I think it'd be good to kind of cover that, where what's the thing that the team is working on right now?

Jason Swank

Right now? Yeah, I mean, if so, it might seem a little fundamental or basic, but but but really, right now, we're kind of doing a lot of kind of back end development, which doesn't sound great, kind of fixing those pipes a little bit, right. But on the front end, you know, UI perspective is sort of a new component, browse, and view experience and search experience. So consistent with other software registries, be able to kind of go and view, you know, software, put it on Maven Central and all the information about that, to be able to search for it. And actually, you know, buy qualities that you care about, to surface discovery of those components. That's kind of the first step is like, what's the single face of Maven Central, because right now is a little disparate, right? Where you download things isn't where you publish things isn't where you get the help and support information, right. So kind of consolidate that into a single experience.

But that really leads into the next thing we really need to do fundamentally, which is kind of deal with identity management a bit. So that we can kind of modernize that system, and begin communicating with publishers and consumers about other changes we have in the works for Central. So we need to first build that venue, that space, right? And then make sure people can can log in update information, see new information that we're developing and current information that's in place, right. So once that foundation is in place, which is expected, you know, this half of the year, I know, we were kind of shooting for Q1 for the UI aspect, the anonymous access.

But you know, going in from that, we get some really interesting things, right, we have a lot of sort of proprietary metadata and other you know, other data, we've crunched around this. So we want to start exposing to consumers and publishers, right. So once we have this new AI, we can start doing that, to have a place to put it. But really focusing on the publisher side of this, right, making sure that, you know, how can I validate that my thing has these requirements, we publish to Maven Central without kind of failing a deployment to Maven Central? How can I check that out? To begin with? Right?

You know, and then modernizing through the publishing through API's and experience, right? So so we working with them OpenSSF and sigstore around some of that integrity aspects, but really, you know, an alternate mechanism to get your things into central sort of streamlined way that sort of meets our metadata requirements, and what the ecosystem really wants around this, but also kind of meets developers where they're at and the ecosystems that they're in and the tools that they're using. So, so in a nutshell, kind of foundational stuff, identity management UI stuff, followed by sort of more metadata for happening in parallel with a lot of publishing changes, you know, and more abilities around that.

Brian Fox

Yeah. So, so, you said Q1, it's mid March right now. So we have it right. We're probably weeks away from having something to show in this, which is really exciting.

Jason Swank

Yeah.

Brian Fox

You know, I think this is this, we realized that this was the start of building upon that experience, you know, historically, there's been multiple disparate parts of central there's been the search, which was one of the first things I think, Joel, you created when you started here back in 2010. Right. And, you know, it was kind of it was built as a sort of an add on to the CDN and to the OSS RH Nexus implementation of the publishing Right. And, and, and they have not until this point really been brought together. You know, I think there were some interesting things that we learned last year, as as the result of sort of, you know, the Bintray watershed moment, you know, I had reached out to a lot of our high frequency publishers, big companies like Amazon, and Microsoft, and VMware, and, you know, all the ones you would expect, you know, because what's interesting here is this infrastructure that we're talking about, that you're running, that you're modernizing really represents the last mile of their CD deployment chain. Right? So they're pushing open source libraries to use their cloud infrastructure, for example, and the world gets them from Central. And so any the friction that we had in place there was really causing them some challenges. And so we reached out and did a lot of discovery, to understand how did they want to see the world and, you know, the notion of atomic commits and knowing when things were deployed and, and transactionality kind of came out of that which really led to this work that you're doing now. Jason, right. We needed the pipes to be able to deal with that.

Jason Swank

And it's important, I don't want to say “hey, Amazon isn't important. Google isn’t important, or RedHat or JBoss. They are the big publishers we talk to and an important class of our users, right?

But you know, what about the person who’s just learning programming, right? Maybe they don't want to use Java they use Kotlin because it’s sexy or whatever. How can I make that experience, you know, kind of kind of be the entryway into programming, right? We have smaller consumer / publishers as well, the folks on the other side side work, I'm putting something on GitHub, I want to fast track to get stuff in Central, I don't want to spend two hours figuring out all this big enterprise-y stuff, I'm just gonna go learn go or Python or something. Right. So there are the big poster publication concerns, it's sort of a unique set of other concerns for smaller publishers, you know, you know, as well, that are very real and important. You know, I think that's what kind of the the ecosystem and more smooth paths for, for people just starting out, or people just want to quick projects, making that easy for folks is important. And it's a different set of challenges and technologies for that, too. But surprisingly, the overlap, right, you can think about, you know, sigstore or provenance and these things and like, some of these same solutions are going to apply, I think, to to multiple, both classes of publishers in the different use cases.

Brian Fox

Yeah. So we've touched on sigstore in a glancing way, a couple times here. Sigstore is an open source security ossf, I forget what it stands for now Open Source Security Foundation, at the Linux Foundation project. Do you want to talk a bit about what's going on there? And how you think, what part of central today might might most this might most apply to?

Jason Swank

Yeah, it's a little bit of an open question. Right? You know, I mean, in so in sigstore, they did an awesome job in terms of explaining what what what their target is, why, why they exist, and why it why that functionality exists. But in a nutshell, it's that PGP infrastructure that doesn't meet modern security needs, right? In a nutshell. So centralizing that level of control and authenticity, and the provenance and origination doesn't work with PGP these days. And so decentralizing that to a certain extent. I mean, it's, it's evolving, I think we how the how different software branches and repositories, want to use sigstore that we all have this problem as the Linux distributions, etc, right? But there's this idea of, you know, here's the thing that publishers producing, I've made these bits and they're the my bits and know, wherever they go, I want you to trace it back to me, these came from Jason, right. And there's this idea of those quality things, right, that, hey, no, this was actually published, I didn't just make up a name and type of squat that right. But there's been some sort of other information that this is conforming to a policy of some sort, whether that name and version, you know, whether that I had permission to publish this thing, whether I included the metadata, right? So it's not just that I created this, this this this thing, but it's this other attestation that these other policies were met, right? Did I include an sBOM? Did I include my name to include an email address all that stuff? Right. And so I'm not sure how sigstore plays. I think it's an open question, actually, how do you validate that information offline? And all the places that a package may show up? Right? I don't see it. I know that Brian created it. But was Brian just making stuff up? You know what I mean? Like, is this a real thing that actually did get published? You know, it's harder to tell that and so, but I think sigstore definitely has a place in terms of that origination. Right? This came from Brian, or whatever. I think it also likely has a place in terms of those other policy and that conformance, you know, but there are some other approaches, too, right. I think it's kind of an open question for a lot of folks.

Brian Fox

Yeah, and I know, historically, the PGP signing and all the key management, you know, has been one of the most hated parts of the requirements there. Because it is a little bit challenging. And it's also unfortunately one of the least used, you know,

Jason Swank

It doesn't have the stats.

Brian Fox

We're not seeing people pull down the signatures, which means they're not validating it. So we've always required that they be signed. And it really serves more as a tripwire kind of way of us being able to validate that the artifacts have not been mutated anywhere since the publisher created it till we had it till it's been sitting on our desks on our discs for a decade. We can still go back and, and certify that that's kind of the only purpose it serves. Right. And and so there have been a number of initiatives that we've kind of watched over the years that I've hoped might be the next best thing. You know, and they didn't pan.

Jason Swank

That’s a long List.

Brian Fox

That’s a long list. So we didn't want to jump, jump and do the work of moving the ecosystem to something that wasn't going to stand the test of time. But I really think with all the energy behind sigstore and the thought that's gone in behind it, you know, we made that statement. Jason, you wrote the blog a couple weeks ago, and it was more of a statement of intent, right?

Jason Swank

Sure. Yeah. Yeah. I mean, we've been tracking this last year, you know, along with everyone else. It's really exciting development. But I know we can get certainly a lot over the last since I made that blog post. I've got another OpenSSF meeting, you know, in 20 minutes here.

Brian Fox

Yeah,that's right.

Jason Swank

That's right. Yeah. So um, yeah. Really exciting. It is. It is early on that technology has a lot of moving parts. And I do like how the Ruby ecosystem and RubyGems approach this in terms of like, Hey, here's the rocks here and how we want to approach it. Here's some basic tooling. Here's a phased approach, right? Because there really are some, some rough edges and things that people aren't sure about until we see it in action, you know, how it can pan out?

Brian Fox

Yeah, the open source way to solve those is to pick up a shovel, right? And so, yeah, that's kind of what the statement of intent was there to try to say, look, we think you have most of the direction of a solution, we have a lot of related problems. And so let's work together on this. It makes no sense to go off in a different direction and try to invent something new for the Java ecosystem if the rest of the world is coalescing around sigstore, that's where we want to be.

Jason Swank

Yeah. And it's really exciting. I mean, it's sewn into IP. And as we as we people have become aware of this, like, it's, it's interesting in that level, right? But it's who's coming out of the woodwork, you know, in our public relations, I guess around this, like, oh, yeah, I thought about this. I did a paper, you know, six months about how Central could use this. And so there's some unexpected sources or inspiration and activity around this. It's awesome.

Brian Fox

Yeah, yeah. The hatred of PGP motivates a lot. Far more than I even expected.

Okay, so I think we're, we're getting towards, towards time here. Are there any other key topics that you think we should we should share? I think we've covered a bit of the history, some of the recent challenges. Any parting words? Joel, anything you want to add a war story or a scar that you have?

Jason Swank

I will say Joel's probably he's probably helped 1000s of people, you know, their projects. I think about the job of modules, folks. You know, there's this long list of everyone who's ever dealt the ecosystem somehow knows who Joel is.

Joel Orlina

Well, they've seen my name, which is interesting. They view that name with some emotion. And I think it's very polarized. It's either Oh, this, David Blevins likes, say, you know, I feel like, you know, the day you know, I saw that JIRA ticket and your name on it was resolved. It's like, that's the day the company started. And there are other people who were like, I spent hours fighting to get my five files onto Central and, you know, signing or fulfilling the requirements was painful. I, you know, it's, it's been, it's been an interesting ride, I, you know, we have a lot of automation now, that helps with validation. But, you know, I tried to sort of eyeball a lot of the requests come in, and to see, you know, what people are just trying to publish and Jason touched on is like, you have giant publishers, and then you have someone who's learning and you know, Maven is a place for all of them and central is a place for all of them. And it's been it's been very rewarding even you know, on the days when Twitter has a lot of complaints. Those days are actually have have diminished significantly since we stood up new infrastructure to help people out. But you know, it's it's part of Sonatype’s legacy. It's, you know, something, you know, I've been very proud to have been a part of, and for all the missteps we've had, I think that you know, we really have, you know, done more good right than then we've caused annoyance over the time.

Brian Fox

It was fun. At swarm pre pandemic in Vegas to see David Blevins come up and give you a hug and say, wait your the Joel?, It was nice to see you get that.

Jason Swank

He gets viewed as a bot at times. They thought it was just the meetup thing we had.

Brian Fox

Yes, Joel is a real person and not a bot. Okay, Jason, anything you want to you want to add?

Jason Swank

Oh, no, I'm good. I think Brian, I'd like to come back though, in a few months, though, and, and talk about where we're at with Central. So this will be a very different story, you know, in terms of what we can show up and talk about.

Brian Fox

You heard it here a couple of weeks. Hopefully.

Jason Swank

I said a couple months come back

Brian Fox

A couple of weeks.

Joel Orlina

Yeah, so don't have too much time. Right? Well, we'll we would have to focus only on a handful.

Jason Swank

A couple of those are MSW.

Brian Fox

The horror stories might be the Bash files.

Joel Orlina

You know, I took a lot of those and sort of we kept a lot of Bash around, we maybe we shouldn't share those. But I, you know, I like to,... you told the story about the machinery that was in place, I can tell you a story somewhere down the road about the very first outage, that Maven Central early on, you know, took on was really due to me. You know, as it was like in an attempt to increase the failover a, Brian, you touched on the US and the UK. Yeah, we had had an unbroken, you know, sort of level of service until Brian said, Joel, it's your job to make sure that all these things are turned on, replace a disk on the hardware. And I think I flubbed some sort of load balancer configuration and took us out. For like 30 minutes, you know, and I think it I kept my job Brian said that I could keep doing this.

Brian Fox

Yes. It worked out. All right. Well, this. This has been fun. Hopefully, it's been informative for the audience and, and we'll come back and have this chat later. So Kadi, do you want to take us away?

Kadi Grigg

Yeah. Thank you. you all so much for joining this week's episode of like a good development. The show was co-produced by Kadi Grigg and Omar Torres and made possible in partnership with our collaborators. Let us know what you think and leave us a review on Apple podcasts or Spotify. If you have any questions or comments, please feel free to leave us a message. If you think this was valuable content. share this episode with your friends. Till next time

Tags: Everything Open Source, The Central Repository, Maven, featured, DevZone, Wicked Good Development

Written by Kadi Grigg

Kadi is passionate about the DevOps / DevSecOps community since her days of working with COBOL development and Mainframe solutions. At Sonatype, she collaborates with developers and security researchers and hosts Wicked Good Development, a podcast about the future of open source. When she's not working with the developer community, she loves running, traveling, and playing with her dog Milo.