Wicked Good Development is dedicated to the future of open source. This space is to learn about the latest in the developer community and talk shop with open source software innovators and experts in the industry.
In this episode, Director of Product Management Sonatype Jamie Whitehouse joins Kadi as guest host. With the unique perspectives of Product Manager Engineering Manager Daryl Handley, Data Scientist Cody Nash, and Principal Engineer AJ Brown, we dissect the evolution of software supply chain attacks and lessons learned. We’ll dive into how credit card fraud detection and supply chain attack detection are similar, the data science behind these systems, and the behavior of the developers.
Listen to the episode
Wicked Good Development is available wherever you find your podcasts. Visit our page on Spotify's anchor.fm
A.J. Brown - Principal Engineer @adrianjbrown
Cody Nash - Data Scientist
- Darryl Handley - Engineering Manager
Hi, my name's Kadi Grigg, and welcome to today's episode of Wicked Good Development. This is a space to learn about the latest in the developer community and talk shop with OSS innovators and experts in the industry.
Hey there. My name's Jamie Whitehouse. I'm the director of product management here at Sonatype, and I'll be your co-host today. We'll be talking about the evolution of software supply chain attacks and lessons we've learned along the way.
Today. We have an amazing team from Sonatype, including principal engineer A.J. Brown, Data Scientist Cody Nash, and Data Services Engineering Manager Darryl Handley. Welcome, all. And thanks for being here.
Thanks for having us, Kadi.
Before we jump into the official questions today, could y'all introduce yourselves and talk a little bit about why you're interested in this topic. Darryl, why don’t you kick us off.
Yeah. So for me, a lot of the interest for me comes from learning a lot about data: data engineering and data science.
I knew a bit about data engineering from my previous teams here at Sonatype, but coming into this team, I learned a lot about data science from Cody and just from learning on my own. And, it's been a really interesting journey to pick up some of that machine learning knowledge that we use to find stuff like supply chain attacks.
Nice. Well, you mentioned Cody, so why don't we hear from him?
Well, for me, it's an interesting data problem that crosses a few different domains of, you know, it's not just tabular data describing the characteristics of packages, but you also have natural language processing for going through the code. And then, you get into graph properties of the relationships between the packages. So there are a lot of interesting aspects to it for me.
Nice. Thank you. A.J., what are your thoughts?
Yeah, I mean, as a longtime developer, I'm obviously I'm affected by supply chain and supply chain attacks. I think my interest is in helping solve the problem is just, you know, this is like a look into the behavior of developers, which is very fascinating as a science, and being some of the pioneers of that. It's not hard to be excited about it.
Well, thank you all for being here. Let's dive in.
So we've been talking about software supply chain and software supply chain attacks for the past few years, and you know it's no secret. We also have a State of the Software Supply Chain Report that comes out every year, but we've kind of seen these attacks evolve over the years.
They're constantly changing and getting more and more sophisticated. I think what I'd like to know, though is how do we first start identifying those different trends in how these attacks are happening? What does that mean? How do we see these different changes in the industry?
Yeah, I’d tap AJ here.
It was a couple of years ago, and we were all sitting around and talking about kind of these things. And he was part of that original nucleus of, you know, AHA, what can we do about this? Should we do something? What can we do?
Okay. Yeah. I think what got us thinking about this, in general, was some of these attacks that were happening, right?
So there were npm packages that were being hijacked to inject these vulnerabilities that Sonatype has been helping you identify for a while, and this is a problem that gets ahead of those vulnerabilities being detected. Right. You know, as we started looking at these attacks happening, we started to realize, “Hey, there's a similar solution in the world to this,” right.
So if you think about credit card fraud, so you know when credit card fraud was first happening, it was an annoyance, right? Every time something possibly suspicious would happen, then you might get flagged. Right. And that's because credit card fraud looked at not individual behaviors of types of people.
Right, you know someone that's a TV salesman probably buys TV's a lot more than someone that's not, for example. Right. But those rules are applied across the board. And as credit card fraud got better, it’s because we were able to classify the behavior of different types of people and apply those rules to them.
Right. So me who travels to, you know, the same eight bars every week, right? I should probably never get flagged, but someone else that comes from out of town, it goes to the same eight in the same night, then that person might get flagged.
One of the things I remember… I don't think it was an external event, but nothing that triggered us on it, but it was just further evidence. There was a proof of concept published where somebody modified a PDF, and it produced the exact same hash as the original PDF. So all of a sudden, you had this thing, which seemed authentic since it had the exact same identity, but the contents were now different. I mean, I know that's a PDF, it's different than software packages and payloads, but I remember that being one of the things where it was like, yeah, like people are out there, they're trying this.
They might be exploring in a benign way, but it won't be long before this is actually weaponized.
Yeah. Jamie, that's a good point. You know, we traditionally thought of vulnerabilities as things that have to be exploited. Right. So things that are accidentally put there, or killers they put there, that someone then exploits.
This new kind of generation is an actual attack. It's someone trying to get something that's vulnerable or does put some harm into your package, which means they're trying to disguise it so that you can't see it. Right.
I think that's a good distinction to make because up until recently, a lot of vulnerabilities I would have described as more like runtime vulnerabilities. Right, you need an application. It needs to be running. This is in there, it's an accident, or maybe it was malicious. But it's attacking a running application. Whereas what we're now seeing is, are a different kind of attack, which is actually attacking the software supply chain itself at the point that things are downloaded, not necessarily when they're deployed into production.
Yeah. That's, that's a good distinguishing statement as well. Again, we think of vulnerabilities, we think of the thing running in production that gets attacked, but some of these software supply chains we're seeing, especially things we'll be able to detect, are happening on build machines or the developer's machine.
Right. So even if you have the best security practices in production, all the security scanning you can do for the application being deployed, you might not ever see that attack because it doesn't actually make it into production.
What kind of havoc does that wreak, A.J.? I mean, what is the difference in those two, like finding those types of issues, if it's time of download vs. deployment, so what? What's the difference?
So I don't have the number on hand or the actual incident on hand, but there was a case where a developer tool was downloaded through the supply chain. So you install a package or whatever that ran on the developer machine to build machines was mining Bitcoin.
And it was a significant amount of money that was made that way. Right. So you or your company is paying for someone else to mine, right? That's your compute resources you use to give someone else money, as one example.
But then there are even more malicious things. Right? So this is just another door into your infrastructure.
What other real-life implications are there for issues like this? I mean, you gave the crypto mining example, but I'm just trying to think of healthcare, right? So people have these technologies that are life and death for them.
So if they are open to these types of attacks, I mean, I would think one, that's a lot of lawsuits. And two, There are going to be a lot of backend fire drills, and three, how do you get it back up and running as it should be in real-time without any, for example, what if someone died?
Yeah, I mean that, so that leads me to think about the ransomware attacks. So this is a different class of attack, possibly vectored through supply chain delivery. But, you know, there were major cases, at least a couple of years ago, where hospital systems were shut down by ransomware.
And so, somewhere do you draw the lines. If hospitals can’t do what they do, which is save lives, that means someone probably died attributed to a ransomware attack. Right? So the consequences could be huge.
So when you first start identifying that there are these types of, you know, new trends and attacks, what would be the best plan of action for enterprises when they do start noticing these different types of attacks?
So as new packages are published to npm or PyPI, what Sonatype does is we run them through our machine learning. Our data processing pipeline, which detects, just basically records a bunch of different attributes of what this release looks like and compares it to previous attacks.
And then what we can do from there is we can decide if it looks suspicious or not. If it looks suspicious, then we can use our firewall tools to prevent that from being downloaded to the developer's machine. And then additionally, what we do with that information is we send it off to our security researchers who will do a deeper dive on it and check to see if that is actually a malicious package or sometimes they're benign, sometimes we might've caught a false positive.
One of the things you mentioned there was the attributes and comparing to past releases and comparing to other attacks. I think this is a great time to have Cody speak to kind of those signals. How do we even learn them? What does it look like? How many of them did we throw out? How do we even figure out which attributes matter?
We are constantly evaluating new signals. I think we're at over 40 now, if not more than that. And it’s driven by a lot of brainstorming. I mean, I remember early sessions with A.J. where we'd all sit around a room and pitch just dozens of ideas for what we could collect and then make mind maps and just lists and lists of things that we could do.
And then it was just a matter of triaging: What do we think is the most important? Which one is feasible? And then we start looking at the data and what is actually useful for finding these suspicious packages and, you know, like the NPM ecosystem has a well-known, uh, I don't know if you'd call it a vulnerability, but just an easily exploitable attribute wherein the installation package, JSON, you can run scripts with arbitrary code. And it's kind of a feature of the ecosystem that a lot of packages use for real valid purposes, but a lot of attacks are leveraged through that. But just seeing if a package uses one of these scripts is a pretty good signal, you know, to look more closely at that packet.
And so we see a combination of features where there's a brand new package published that is using these types of scripts and also has some encoded binary file. Then, you know, that's a huge red flag to send it on to the security researchers. We always have humans look at it before we actually flag it as malicious. It's definitely a human-in-the-loop type of system.
Yeah, I think one thing that might be helpful again, thinking about the audiences, is why we're doing this. S not just the supply chain attack piece, but why this approach and our theory was basically like, “Hey, we want to know when a project behaves in a way that it would not normally behave.”
Right? So that's what, when we say suspicious, that's what we mean. For the last five years, it's worked this way. Published releases happen at this time, or whatever. But for some reason, something different happened this time. What's up with that? And as you mentioned, we use the machine learning pieces to raise those to our attention and then kind of review to say, “Okay, yeah, there's something weird going on with this.”
Suspicious doesn't always mean malicious, right? It could just be, you know, somebody else new joined the project, and they did things differently this time. That’s okay. You know, sometimes it could be someone took over the project, which is also something you want to caution from, even if it's legitimate. It's still different. Right.
Through that lens, I'm curious about all your takes here; Darryl, Cody, A.J., on the applicability beyond something like supply chain attacks. A.J. one of the things you said that made me think of this was we're looking at the behavior of the project of the releases of that binary of the people that are involved in it.
Do you think there's applicability beyond detecting supply chain attacks and why? Like, how could it be useful?
Yeah, so the first thing that comes to mind, and hopefully, I'm not getting too deep on this, is how can we use behavior to understand the quality or the risk of quality changing?
I'm gonna use the example of this project's been happening for five years. Always done things the same way, and a new person comes in and is like, “Is that good or bad?”
If it's something like a rock star, subject/domain expert that comes in and is top of their class programmer-type guy, whatever adjectives you could use to think of the best programmer that you have ever had. Maybe that's good.
Maybe it's better for the project. But if all of the elite people have left the project and now it's a bunch of newer, not as elite people taking it over, maybe that's bad for the project. Can we use that same behavior for information that we use to understand the project? You can then understand things like that.
This is like looking at people then, right? These are real people. They have their own attitudes. They have their own likes and dislikes and their own disposition in life.
If we're now starting to look at people, one of the things you've talked about a little bit, it was almost like rock stars, right? These are experts. These people join a product. People can then potentially make or break a project. So Cody, and from a data scientist perspective, I know this is talked about a lot in the media, right?
The bias and the implications of how these things are used across large data sets. I'm curious what your thoughts are on the implication of assessing quality based on kind of the quality of the people joining or leaving and their contributions.
I think as far as bias and code quality, especially with machine learning models, with somewhat black-box approaches to this, you know, especially once you get into static analysis where you're looking at the code itself or the just the text and the read me files. And so on.
There is a danger. Like if you learn that this author's email is frequently associated with malware and someone has a similar email that can generate false positives and, you know, bias your system- discriminating to get someone and also making your performance worse too.
And so it's important to take a few different approaches, like try to clean out personally identifying information from this. So you don't learn on these particular individuals. It's also something where proper cross-validation, like ways of developing the models so that they are generalizable and not overfit to detect that.
Say packages with a specific language in them are frequently malicious. You know that is another danger of a bias in the system. Honestly, right now, we're very biased against English speakers because most of the malware is just laden with English text, but that could easily change, especially if there's some less frequently spoken language that doesn't have a lot of benign examples. And then suddenly someone starts publishing a lot of malware in that language.
If you're not set up to defend against bias, that could easily creep into your system.
Interesting. Almost because the frequency overweighs the lack of the frequency, right? It's not diluted enough. You don't have enough of a sample set to say, “Hey, there's a balancing act here.”
All of that starts to look bad, regardless of if it really is.
I think another kind of more common bias that we have to deal with is like, there's kind of a small difference between hobbyists publishing small one-off packages and malicious actors who publish small one-off packages.
And sometimes, if it's your first package that just does a console log, hello world. That's not very different from someone that has a console log that prints out your password and other information. And, you know, there's a few, one line of code, even half a line of code difference between a hello world package and something that sends out all of your personal information.
Cody, do we see differences across ecosystems? You were talking about bias with English speakers in the code, but is there a bias in the different ecosystems itself as to which ones are more prone to attack? Or have better security protocols in place?
I mean, the ecosystems that have easy ways to execute arbitrary code are the most commonly targeted, and the ecosystems that have more secure author verification and package upload verification almost have no attacks whatsoever. It's a dramatic difference.
Python being a very popular language, especially for people learning to code for the first time these days, we see a lot of junior programmers and school projects being published to PyPI. And there is another fine line there between school projects and malware and the ecosystem that we have to kind of pay attention to so we aren't flagging every CS101 project that gets published to PyPI.
There's some interesting balance there between making it easy to share software versus making it easy to just put a bunch of useless or bad things out there. Right? If you have to go through a bunch of steps to publish to Maven, for example, you're probably less likely to put your little a hello world thing out there.
Yeah. Maybe the ecosystems could have some kind of sandbox for, you know, the whatever stuff. And then this is the official stuff. This is to actually want to download. So you could use that as a sandbox but then promote it and have to go through a better authentication process or something like that.
A Secret decoder ring.
Getting access to the club.
You know, we've talked about how you guys have kind of identified these different attacks and are realizing where there are some of these issues and how we can have a solution for it. But how are we constantly looking to modify that? Because things are always changing.
So how are you able to prove out that, like A.J., when you were talking about learning from the credit card and building a module, how can you prove that your hypothesis worked.
Well, the one thing I can talk to that a little bit. We do sort of look and see what we missed. So we're always looking back to see- you know we miss stuff. We don't catch everything, and we miss stuff. And we're looking back to see what we missed.
Recently Cody has been doing some improvements to the npm detection system, and he's found a whole bunch of stuff that we missed. That everybody missed. And he's just been adding some. Gosha, I don't know how much I can talk about it, but he's been adding some new features that we're working on, and they're vastly improving our detection, and it's an ongoing process that we're going to be evolving this for years.
It's not like we're done, it's all done. We’re constantly looking for new ways. And one of the things we're looking to do in the future is to start looking for novel attacks, which are the things we haven't seen before. All our current software works. Some things we have seen before, to then detect things that look similar or other instances that share these signals and they look like these other things.
So we're going to be looking for things that don't necessarily look like these other things, but it could also be potentially suspicious or malicious.
That’s a great thing to raise Darryl. And it was actually one of the things as we were starting to productize this; talking to people.
Moving out of the research into the development and trying to come up with a way to talk to people about that analogy. Why it's important to them, but also what to expect out of it. And I often use the analogy of the early days of credit card fraud protection like A.J. did. I think we are in those early days.
A big part of the early days of fraud protection is there were a lot of things that you might not have known about. One of the things that we realized with supply chain attacks is nobody knew about these, right?
A lot of companies that are trying to secure and highlight all the vulnerabilities weren't noticing these new types of attacks, mostly because they are not runtime attacks. So they're very novel. They're new to this system, and one of the things that we really did notice is nobody was actually looking at this.
So that's why we developed the system so that we could detect these things at scale. And that's the whole notion of our RI suspicious detection system is a lot like the early days of fraud detection. There's a bunch of things that we capture.Some things that are novel, some things that are new. Like many other folks in the industry, there are a bunch of things that are just unknown.
So we have a huge system where I think we identified something like, thousands, 10,000 kinds of vulnerabilities coming in through the supply chain system due to these kinds of attacks on different kinds of components. So the system's working, but it is still in its early days when we need to continuously refine that system.
As new supply chain attacks become aware - adding those into our catalog, and we have some examples of those. But also doing our own novel research hypothesizing like we did in the early days, hypothesizing around how the supply chain system could be attacked and creating models around that.
Assessing if anything like that happened in the past and continuously assessing to see if anything like that happens in the future as we're ongoing
Yeah, I'll add from maybe a softer level. Maybe not an indication that we're doing it well, but I remember when we first turned this thing on that we'd put lots of time and energy and research into, and we actually detected something. That was fantastic.
LAUGHS I was happy from that day still. So I say we're doing it well today.
Well, that’s a great segue into another topic. Could you all kind of talk about our journey to production—the research phase. Wait, what, how did that look?
Well, part of it was, where do we get these lists of malicious packages? When we started, we had the list of NPM advisories that were published and we got a few hundred from that list.
And then, we started collecting signals and training models on those signals. And very quickly, we started seeing that there were hundreds more that were either already taken down from npm. We just didn't happen to know about them yet, or that they were still live on npm. And so, there was this early growth phase where we quickly saw that these things were malicious. And this was another part where a team of security researchers was very helpful to us. It's like we'd see these things that had weird author behavior and some unusual code in it. And we could send this over to the security research team and would be like, “Oh, yeah, that's malicious,” or “No, that's just something benign.”
So it was this very iterative process with feedback from experts in that domain and the re-training of the models on each new round of labeled packages.
So you mentioned iterative and re-training, and I think it all sounds really positive and smooth. I'm curious about any of the rockinesses along the way. What did we learn and what did we fail at, and how do we pivot from those?
Yeah, we had a recent failure actually. We went to retrain our npm models on some new data that we found, and Cody spent a couple of weeks on it working on these. And we got to the end of it, and we actually learned our previous models were better than the current, the new models we had created. So from there, we decided to go down a different path, and that was basically these new models are being trained on our old signals. So the path we went down from there was to add more signals.
So that's what we're currently working on, and Cody is finding more things that we can add to our to the information that we use for training. So we're constantly evolving like that. I don't know if you want to speak to that more, Cody, or just leave it there.
I don't know if it counts as a failure or not, but one of the ways we pivoted very quickly was for the bi-directional character attack that was published late last year.
And we very quickly implemented some systems to detect it. It's a fairly straightforward attack. But there were no known examples of it. No one had actually generated an attack with it. There were a few proof of concept code snippets and GitHub, but that was about it. And you know, we've been monitoring it, but as far as I know, no one's actually used that exploit.
Maybe just everyone jumped on it too quickly, and no one bothered to make use of it. Of course, now that it.
So I can speak a little bit from like the product side of failure, right? Maybe not failure, lessons learned ideas. I think when we were building this thing, and it was awesome, and it started working, and we realized that's something we can do.
We were very focused on the technology and the capability to do this thing. So, detect these malicious suspicious packages. And then we thought, “Hey, wait a minute. How do our customers use this now that we have it?” You know, the first thing that finally came to mind is, “Okay, what if we put this in our Firewall product?”
That's great, except that people are downloading it give me the latest version of this package. Right? And we built this thing that's gonna stop you from being able to download the latest version of it sometimes. Right. Because then I was like, “Oh, okay. Now we got to solve that problem as well.”
So I think the lesson learned there was no matter how cool the tech is and how useful the tech seems, if no one can use it effectively to do their job better, then it's not a win yet. And I believe we have mostly a win now, right?
We do. Yes. It's, I think, on our docs. It's called policy-compliant component selection. Basically, it filters out the things that aren't passing policy when you're asking for a version range. So you always get a version that satisfies your range.
I think that's a great segue into another topic related to that is, in I think it was February of last year, we saw the dependency confusion attack dramatically rise. We always knew it was possible. Maven Central, we’re the stewards there. There are a lot of controls that guard against that for central, but not in other places.
Darryl, I think you were part of the team then. I'm interested in your take on how we solve that. Do we solve it with our release integrity, suspicious, all the signal stuff, or do we solve it in some other way and why?
Yeah, I think the solution for that didn't actually involve my team, but the solution for that was to put it in something a little bit more rule-based, and the rules for that one we're in Firewall.
The dependency confusion attacks are attacking people's legitimate domain name names and their namespaces. So what we did is, for the repo customers, at least when repository manager - we’d basically create an allow list of your namespaces based on the packages you have published in your own repo.
And then, if there's another thing coming from an external repo that matches that list, we know it couldn't have come from your company intentionally. And we did that because the amount of namespace attacks was basically flooding our RI system and creating essentially a bunch of false positives for our security researchers to look at that we could never have kept up with.
Jamie. That was perfect. So we've talked about quite a few different things here, and I just want to kind of wrap this conversation up. We've talked about a lot of things that, honestly, some people haven't even thought about or even heard of these types of terms or concepts.
So I think my last question to everybody here is what would be your best advice to people who are now kind of scratching their heads after hearing this and being like, “Where do I start now that I actually know this is something I should be aware of.”
I think it highlights our ethos, and our product is providing genuine value that is of use to people. Right. AJ touched on the NPM latest, and I think that really highlights it. We solved being able to detect signals on mass so that you didn't have to hire a bunch of humans to do that. We can do it very thoroughly.
That was the only part of the problem, and to truly solve the problem, you needed a repository manager on the other side. So honestly, I think your only real protection against something like this, at the scale it’s now happening. And why we built the system is because the scale was increasing with automation.
So using a product or software, like our Firewall product. You just can't just like everything else. You can't keep up with the volume, just with humans and teams alone.
Check out our blog. Just to get yourself educated. Check out our blog, and see all the great work Ax Sharma is doing talking about these attacks.
I think this is probably a good place to stop. I do appreciate everyone for taking the time to record with us today. Conversations like this would not be possible without you guys making the time.
Thank you, Kadi.
Thank you all.
Thanks for listening to another episode of wicked good development brought to you by Sonatype. The show was co-produced by Kadi Grigg and Omar Torres and made possible in partnership with our collaborators. Let us know what you think, and leave us a review on Apple Podcasts or Spotify.
If you have any questions or comments, please feel free to leave us a message. If you think this was valuable content, share this episode with your friends till next time.