Wicked Good Development Episode 22: Fall 2022 Maven Central updates

December 02, 2022 By Kadi Grigg

16 minute read time

Wicked Good Development is dedicated to the future of open source. This space is to learn about the latest in the developer community and talk shop with open source software innovators and experts in the industry.

This session features Brian Fox (CTO and Co-Founder), Joel Orlina (Engineering Manager, Maven), Jason Swank (Director of Engineering, Technical Operations) and Lakshmi Mohandas (Senior Product Manager). Listen in as they discuss Maven Central's relationship with Sonatype, its pain points and how we're addressing them, and the latest updates that make Maven more unified and powerful than before.

 

Listen to the episode


 

Wicked Good Development is available wherever you find your podcasts. Visit our page on Spotify's anchor.fm

Show notes

Hosts

Panelists

Relevant links


Transcript

Kadi Grigg (00:01):
Hi, my name's Kadi Grigg, and welcome to another episode of Wicked Good Development, where we talk shop with OSS innovators, experts in the industry, and dig into what's happening in the developer community. Today we have a panel from Sonatype's Maven Central leaders: including Brian Fox, co-founder and CTO, Jason Swank, Director of Engineering, Joel Orlina, Engineering Manager for Maven Central, and new to the show is Senior Product Manager, Lakshmi Mohandas. Thank you all for being here today. So back in March we did an episode all about Maven Central's history, you know, kind of some untold things that aren't that well known out there in the market. And we also talked about where Maven was at that time, and we teased a little bit about what is to come. Today's show is all about those updates. So before we let Brian take over the mic, can each of you introduce yourselves and tell us what lens you're bringing to today's conversation? And Brian, we'll start with you.

Brian Fox (01:01):
Okay. Hi, I'm Brian Fox, Co-Founder, CTO at Sonatype, and also longtime Maven Committer, former PMC member, also infrastructure at Apache help keeping the Nexus Repository running over there. So, yeah. So I've got my fingers in a lot of pies.

Kadi Grigg (01:19):
Joel?

Joel Orlina (01:21):
Yeah. Joel Orlina, Engineering Manager in the technical operations group at Sonatype. I support among several teams, the one primarily responsible for the Maven Central infrastructure. And in the past year that infrastructure has grown, not just for legacy services, but also where we're gonna take it, you know, in the years to come. And you know, I like to say that, you know, Brian sort of gave me this responsibility shortly after I joined. So <laugh>, you know, infrastructure is, you know, is something we have in common.

Brian Fox (01:49):
Responsibility and a bunch of not great Bash scripts as I recall <laugh>.

Joel Orlina (01:52):
It's alright. It's alright. It's all good. We're all friends. <Laugh>

Kadi Grigg (01:55):
Jason?

Jason Swank (01:56):
Oh yeah. So I'm basically responsible for technical operations at Sonatype, which includes Maven Central amongst other things as well. And so, I guess I'm kind of dealing with stuff that Joel maybe doesn't really want to. Like in terms of costs or optimizations at that level and sort of organizational dynamics right around Maven Central and balancing that with other activities we do at Sonatype.

Kadi Grigg (02:21):
And new to the show, Lakshmi.

Lakshmi Mohandas (02:23):
Yeah. Lakshmi Mohandas, a Product Manager at Sonatype for Maven Central.

Kadi Grigg (02:28):
All right, Brian, let's dive in.

Brian Fox (02:29):
All right, thanks. So, Joel, Lakshmi, and Jason, you guys have been working on quite a bit this year. We kicked off some work. It's been a long time coming and I think we just got to the point where some of it is now visible to the rest of the world as of what, a couple of weeks ago? So who wants to kind of fill everybody in on what the team has been up to and what the listeners can go check out when they're done listening?

Lakshmi Mohandas (02:55):
Yeah, sure. So Maven Central is a repository that Sonatype runs. It's the default package registry for JVM based open source components. It's also the default repository for build systems like Apache Maven, SBT, and others. And it provides a secure platform for access and distribution of JVM based components while ensuring that these components meet a certain quality standard. And in terms of scale, in Jan this year 6.2 petabytes of bandwidth was used by 51 billion requests. And per week, one 15 new publishers are onboarded and 50,000 components are published. That's a quick overview of what Maven Central is.

Brian Fox (03:43):
Yeah, thanks. That's <laugh>. Even though I've been, you know, helping to manage Central for 15 years, it's still hard for me to get my mind around it. I remember back in the day when 500 million requests in a year was a lot, and I was able to fit a copy of Maven Central onto my hard drive. That was a long time ago, and you need a big hard drive for that these days. So in terms of Central, what are the pain points that people are currently experiencing that you've heard about?

Lakshmi Mohandas (04:12):
There are several concerns with the current state of things. Maven Central, the central repository has not really changed in a significant manner for the last 10 years. Interaction with the platform, in specific, the user experience in technologies archive, it's distributed across several digital properties, and it's inconsistent with the other modern registries that are there out there with respective package management, publication, and distribution. For publishers, the publication process in specific is a huge pain point because of the limited clarity around the publication process, because of no visibility on the publication status of a component while it's underway. And because of these reasons the confidence level and in successful publishers flow. On the other hand, for consumers that are using search.maven.org to search for a component, the set of component metadata that is exposed is limited. So there isn't enough information that aids component discovery and selection. And even though components are published in approximately 10 minutes, it takes about four hours for the search index of that component to update, which means components are not really discoverable for that duration of time.

Brian Fox (05:31):
Right. Thanks. That's great context. And so what are the updates that we're planning to do to solve this? I know some of those have already rolled out this summer, so maybe talk about what we did this summer and then what we're planning to do the rest of the year.

Lakshmi Mohandas (05:47):
So you know, taking into account Sonatype's relationship with Central and taking into account the current pain points that I just mentioned. The new term vision for Central became to build a unified, easy to use, modern and secure platform for shaping package distribution. Along with providing high quality data around the creation structure and security of packages, making it easier for developers to create and maintain software. So back in June this year this new experience was launched in beta as the future phase of Central. The launch in effect was a stealth launch because it was only socialized within Sonatype and a handful of external users who registered for beta. But since then, the feedback from the stealth launch was incorporated and the platform was made generally available to public in September this year. The main updates, essentially the core offering, includes, you know, component dependency, independent information, OSS index driven vulnerability information that is exposed on the platform view of most popular categories, packages, and publishers realtime updates to data stores of OSSR published content. And it also addresses all human use cases of component search.

Brian Fox (07:12):
Yeah, that's great. And just within literally the last week we rolled out the Sonatype safety rating, right?

Lakshmi Mohandas (07:19):
That's right, yes.

Brian Fox (07:20):
Which was tied into the State of the Software Supply Chain Report. And so the Sonatype safety rating is built off of a model that is trying to predict how likely a particular project is going to have a vulnerability. And we've run that analysis for some of the most popular components on Central and provided those scores. So more scores will be coming, but if you're interested in that, you can take a look at that on the site as well.

Brian Fox (07:51):
So that was a great description, Lakshmi. Joel, or Jason, anything that you want to add to what's been going on?

Jason Swank (07:56):
I think the only quick thing I'd add is, you know, Lakshmi talked about a lot of the work we've been doing that's focused on publisher interactions, right? There's another dynamic I think at play around sort of security posture, or software repositories, and another sort of work that's sort of happening, I feel like in a broader community, right? So in addition to sort of publisher capabilities and kind of meeting folks where they're at in terms of where their expectations and revising things we haven't revised in a long while. I think there's some emerging security and other work that's happening written parallel to that. So.

Brian Fox (08:29):
Mm-Hmm. <Affirmative>. Okay. Yeah. I wanna dive into that. Before we pull on that thread a little bit, Joel, for the listeners, how do they find this new version of search? What's the easiest way to get to it? 

Joel Orlina (08:40):
Exactly. So there's actually two ways. If you know the url, that's probably where we'd love for you to start. You know, and if you're listening to this, it's fairly easy to remember central.sonatype.dev <laugh>, and you know, the.dev you know, top level domain, you know, is one of those places where you know, we expect people to sort of, you know, think about their internal development work. And, you know, the spirit of us launching it under that URL is like, you know, we have people inside, Sonatype doing their research there. But one of the more recent things we launched is that when you click on a result on search.Maven.org, so if you're already a user of some of this research, you should actually get a little modal that pops up and asks if you're willing to try out the new experience. Clicking yes will take you to the new site for you to try out. So if you're already using search.Maven.org and getting search results there, you click on a search result, I think a version or an artifact name, you'll get the modal and then you'll go to the new site, which is still quite easy to remember central.sonatype.dev.

Brian Fox (09:39):
Right. I learned something I didn't know about that, Joel. I was gonna tell people, if you go to search.Maven.org, look at the top of the screen, the link is right there. I didn't know you did a deeper integration, but that's great to hear. So there you go. That's how you can find the latest and greatest experience. And you know, Lakshmi talked a lot about the publisher experience. Jason, do you wanna describe how we're thinking about this intersecting with the publisher experience?

Jason Swank (10:06):
Yeah, I almost want to defer to Lakshmi, but I mean, the idea is that there's a scaffolding, right? UI aspects and sort of backend services. I think that's really a bigger lift, right? Then the next piece of this is adding the ability for users to log in and see sort of unique information based on that login, right? And that feature set can be very long or various shorts, right? But we have the basic UI scaffolding, the basic services in place, and really just kind of what initial set of functionality we want exposed to users as they log in.

Kadi Grigg (10:39):
Right.

Lakshmi Mohandas (10:40):
I'd like to add some more details to that. So I think next in the near term or immediate plans, what we plan to focus on is the publisher experience. And the goal is essentially to centralize and consolidate the component publishing workflow and exposing a certain limited feature set, like exposing capabilities like CherryBOM in the near term. So this also is-

Brian Fox (11:04):
Can you explain what CherryBOM is? Probably most people don't know what that is.

Joel Orlina (11:07):
I can take that. Part of the publishing process involves making sure that components people want to publish past certain levels of quality. They are requirements in the palm. They mean you make sure that sources and Java docs are there. But the last step is actually where we gather up a bill of materials. If your component is composed of other things in Maven Central, we'll take that and then send that to a custom scanning process that lines it up with vulnerability information, license information from other Sonatype products to email you a report. And you know, our name for it internally is CherryBOM. But it is essentially you know, a software bill of materials report lining it up against Sonatype data that we feel that open source publishers would find valuable. Specifically security of vulnerabilities and potential license threats.

Brian Fox (11:56):
Right. We built that capability last year because we felt it was important to provide that visibility to central publishers, and I was not patient enough to wait for all of the rest of the plumbing to get in place, frankly. And I know you guys love that, but we put together this thing to get some value to the users. But the real goal, at least the way I had envisioned it, would be that when we get the updated search and then the publisher experience, we'd be able to more easily integrate some of the other things we're doing. Like Sonatype Lift, you know, which are capabilities that are already out there and free for people to use on open source projects. We just didn't have a great way to integrate it into the current very Nexus Repo based publishing process. So the CherryBOM was kind of meant to be a short lived bridge that's now been, what, a year and a half, but still functioning, still providing value, and part of the master plan. So that's that's CherryBOM. It's basically an SCA scan of components before it gets released to Central so that publishers have the opportunity to correct things they may not have known about. So that's, that's what CherryBOM is. Lakshmi, you can pick up where you're going from there on the publisher stuff.

Lakshmi Mohandas (13:11):
Yeah, I was just going to say that the, you know, as part of enhancing the publisher functionality, we also plan to integrate with NIDP. It includes identity management with multi-factor authentication and will finally allow publishers to kind of self-service granting of permissions to new publishers from their organizations. So those are the immediate plans for Central the next quarter.

Brian Fox (13:38):
That's great. I'm looking forward to seeing that and I'm sure a lot of publishers are as well. Obviously looking further down the road, you know, revamping the central stats probably a thing. I know Joel, you'd love to see that infrastructure modernized <laugh>.

Joel Orlina (13:55):
Yeah, thank you. <Laugh>. I swear I'm not calculating them by hand, but <laugh> it is not easy, so yeah. <Laugh>

Brian Fox (14:04):
Yeah, it was a great idea 10 years ago that needs some new infrastructure, but fortunately with a lot of the infrastructure we already have in place from, you know, Fastly to Databricks and all these other kinds of things it's like magic compared to what we had to do in 2010 when we first started building this. So I'm looking forward to seeing that come to fruition as well. Again, the publisher experience is kind of like the missing framework that we had to be able to tie all these things together. So that'll be pretty exciting. Jason, I know you've been working with with some people at OSSF, and Harvey, and I think even some people from Gradle to-

Jason Swank (14:46):
Sure.

Brian Fox (14:47):
Kind of talk about the next evolution in terms of component signing. Do you wanna give everybody an update on that?

Jason Swank (14:53):
Yeah, I mean, yeah. It seems to get a little broader every time we talk though, <laugh>, right? So you know, earlier in the year, you know when Sigstore was announced, their RFC for PyPI, the Python repository, I think they got a lot of folks' attention, including ours. Cause we have a lot of problems with the way we assign packages in Maven Cental. So looking at all those details, there's been a lot of writing about that. And so as we engage with sort of OpenSSF and around Sigstore you know, there's parallel efforts as well. So, going back to Sigstore specifically what we've been doing for a while now is meeting regularly with the Maven and Gradle build tool teams, right? About, well, how will we support Sigstore, what are the use cases?

Jason Swank (15:32):
What's a roadmap towards that from your perspective? Because the Maven Central aspect of this is only a small piece of kind of the overall puzzle, even with respect to Sigstore, right? The build tools need to support it, what packaging formats are in place, how do they validate it? What kind of signatures would they accept? Who would be the signer? And so that's been a really good sort of dialogues and now we're engaging with the broader Java Sigstore group as well. And that's where it's expanded a bit, right? So instead of being at Sigstore, what about you know, In-Toto Attestations, what about the SLSA framework? What are these other files that we also want? You know? So I think it's a good sort of conversations that are happening.

Jason Swank (16:11):
It is a little wrangling cats because there's a lot of work I think a lot of people want to do and saying, "Okay, what's the basics we need around how we do signatures", right? How we validate signatures, what that bundle or what that format looks like that we kind of need to get right before, "Hey, here's six different types of files or other information we want to include with the package", right? So I think the most exciting part of this has actually been those conversations with the broader community. You know, we've been running Maven Central for a long time and it feels really like there's a wall or a barrier, whether we want it or don't want it, between us, and users, and publishers, and toolmakers, right? And I think those interactions have been really productive, right?

Jason Swank (16:52):
So I know you did a presentation at a conference earlier this year, Joel, right? DevOps UK. Today there are folks talking about DevOps Belgium. And hey, back to back, there's gonna be some Sigstore stocks. There's Harvey is talking about Maven, and Sigstore, and PGP and, you know, folks from Grater are gonna be there, right? So now we're intersecting at these events and that sort of thing, and it just feels a lot healthier than it ever has been from my perspective. So I think that's my biggest takeaway at the moment, Brian, versus like, here's some actual code. This is lining up now with work we're doing for publishers. We talk about staging rules as we talk about being able to modify those and have that be a little more open, right? That really coincides nicely with what needs to happen to support Sigstore as opposed to, you know, just PGP and other legacy mechanisms.

Brian Fox (17:33):
Thanks for that update. And riffing off of, you know, your comment about the ecosystem, I know there's been a lot of conversations going on within the, what do we call it? The Securing Software Repositories Working Group at the OSSF where we have what? People from Ruby, and PyPI, and NPM, and probably a handful of other repositories where we're kind of having conversations about best practices across the board. I don't know if you want to add anything extra about that.

Jason Swank (18:01):
I think that's a little bit what i was alluding to initially that, you know, the security landscape around package managers. I mean, we talked about post Log4j, that sort of thing. It's really, I mean, over the last year it's night and day kind of the level of threats that are being, I think, posed little difficulties repositories are facing around this, right? You know, MFA isn't a silver bullet, right? As one example, right? I mean, hey, when posts use MFA, well, someone loses their MFA token, what do you do? How do you validate? Is it just email based to get your credential back, right? Are there other ways to kind of look into that? Because there's not necessarily an organizational affiliation for these users, right? You can't say, "Yeah, I know where your paycheck goes, so I know how to reset your MFA", right? You know, so I mean, I think as we kind of address one problem, sometimes it exposes us to a few others. And I think that's a lot of what the broader ecosystem is realizing, right? There's not like a quick fix. It's a series, there's maybe small fixes, and kind of wait to see what the reaction and consequences are.

Brian Fox (18:59):
Yep. That's right. And I know that even within that group, the MFA reset thing is a topic and they're looking at proposals to get a central sort of help desk put together to help with that across the number of these ecosystems that are entirely, you know, volunteer based. So some of that is happening, yeah.

Jason Swank (19:18):
That's one idea. I mean, there are almost some fundamental like questions, the answer about open source and that sort of thing that are kind of being raised a little bit. So-

Brian Fox (19:30):
Yeah, it's interesting. We sometimes take it for granted on our side that we have have folks like you guys that are around all the time because we pay you to do that work. But not all of the ecosystems have that. If it's a volunteer and a system goes sideways and it's not their day job, what happens to the repository, right? And that's what we're seeing some of these other ecosystems really grapple with at the moment. So trying to up level that for the sake of the community is definitely a good step forward. Okay. I think we're getting around to the end of the time. These are the topics that I had in mind for covering. Hopefully this is useful to all of the audience out there. You know, as we continue to make progress, I'm sure we'll be back here talking about what's new. But if, again, if you wanna go take a look at the latest search, go to central.Sonatype.dev or go find search.maven.org. And you can get nostalgic for the old UI that's not gonna be around forever. And with that, Kadi, you wanna take us out?

Kadi Grigg (20:33):
Yeah. Thank you guys so much for taking the time to be here today and looking forward to the next Maven update.

Tags: Community, Maven, podcast, DevZone, Wicked Good Development

Written by Kadi Grigg

Kadi is passionate about the DevOps / DevSecOps community since her days of working with COBOL development and Mainframe solutions. At Sonatype, she collaborates with developers and security researchers and hosts Wicked Good Development, a podcast about the future of open source. When she's not working with the developer community, she loves running, traveling, and playing with her dog Milo.