Mapping the JavaScript Genome for DevOps

October 18, 2016 By Derek Weeks

7 minute read time

From artisan to automation.  High performing organizations are using DevOps principles to boost productivity, streamline software supply chains, and improve quality.  These organizations are swiftly moving away from their artisanal approaches of crafting software to the high-velocity, automated practices where applications are more manufactured than developed.

Pulsing through the veins of these manufacturing organizations are the parts used to create the software.  While high performers have access to an ever increasing supply of software components, they understand that gold-standard parts make the best software.  Borrowing lessons from global manufacturing giants, the high performers identify the best components, minimize the number of version used, and they track where every component was used in case it ever needed to be recalled.

Free-for-alls ruled the day.  Until now, manufacturing with gold-standard parts, was easier said than done.  Every development organization has sourced hundreds of thousands of parts over the years in a free-for-all manner.  Any component that served its purpose was suitable, regardless of performance, age, quality or security.  In the artisanal approaches to development, component selection by functionality -- and no other traits -- ruled the day.

Why?  It wasn’t that the development organizations didn’t care about the quality or integrity of parts they were using, it was because precise information about the quality of those components was not available when they needed it.  Without the ability to precisely identify and track components, no record keeping was performed.  The ability to automate sourcing, selection, and traceability to support DevOps practices was starved by a lack of timely and accurate information.

The problem with trail mix.  Imagine, for a moment, that you manage a store which sells all variety of trail mix packages. These bags of trail mix range from sweet treats with things like candy pieces and chocolate chips to bags of mixed fruit and various seeds and nuts.  In addition to these prepared bags, you also sell ingredients by themselves, everything from plain peanuts to candy and dried fruit. Of course, as a responsible business owner you pay close attention to the news, and various public reports, especially those related to recalls on the types of ingredients you sell.

Coming in one morning, as you go about the routine of opening the store and preparing for the day, you get an important message on your phone. One of your suppliers has announced that a number of recent shipments for raisins were contaminated and contained chemicals in concentrations that are a health hazard, and could even cause death.

You start to panic. You never differentiated or kept track of the raisins used from one supplier to the next.  Raisins were raisins.  

Now you have to identify all the containers of raisins on the shelf that contain raisins from those batches. And the more complex task however is that you have to identify all the trail-mix packages using the raisins, including those that have already been shipped to your customers.

For a moment, think of the trail-mix packages as applications and the raisins as one of the thousands of open source components you use. Those different batches of raisins that your supplier provides, well they are the various versions of those parts. Finally, the health hazard is a security vulnerability that might be present in some, or perhaps even every bag of trail mix you sell.

You shut down production, and begin to unravel the mystery of which potentially deadly raisins might have infected your range of products. It’s a daunting task, one you hope to recover from, and if you do, you are determined to define a better system going forward.

Pulling back to reality, our trail mix problem isn’t so far from the truth.

We don’t build software like we used to.  In the days of development past, we built software the hard way. We either wrote it from scratch or copied and pasted code that was often impossible to compile.  Today, developers assemble software using processes that were born in modern manufacturing (lean, agile, kaizen, etc.).

According to studies, 80 - 90% of a typical application consists of open source and other third party components.  Consuming these components in development accelerates innovation, improves quality, and lower costs.  But not all components are created equal.

Open source projects distribute an average of 14 new releases each year.  Some components may offer 20 different versions and others might have 400.  These releases represent modifications in the code aimed to boost quality, improve performance, add new features and remove bugs.  The releases might reveal changes in software licenses or fix a known security vulnerability.  And when an average organization consumes nearly 230,000 components a year, being able to precisely and quickly assess which components are best has been challenging.

Screen_Shot_2016-10-18_at_6.24.06_AM.png

Managing complexity.  In our trail mix story, you were left with a daunting task.  You needed to identify a set of raisins that posed a significant health hazard.  As you looked at your store shelves and all of the trail mix packages, all of the raisins looked the same.  There was no precise identifier for that ingredient beyond its color and size.  Its point of origin, its distributor, its age, and other characteristics are extremely difficult to identify.  

In many ways, the software components you rely on to manufacture your applications present similar challenges.  The lack of standardization, structure, labeling, and traceability are just a few of the issues that make management of components across software supply chains difficult.  For example, in the JavaScript component ecosystem there are 43 million files (and growing), where roughly 6 million represent unique components.  

When huge volumes of consumption are paired with massive variation in supply, precisely identifying gold-standard parts or hunting down a component with a known security vulnerability would be on par with the raisin challenge.  To understand the complexity of managing components across software supply chains, let’s take a look at one popular component: jQuery.

According to Sonatype researchers, the jQuery component has been embedded (and often modified and renamed) in 72,000 different packages. In this figurative jungle, how do you know which jQuery is the definitive one?  Which files not named jQuery actually are jQuery? And which version of jQuery are you analyzing?  What supplier (e.g., open source project) did it originate from?  In effect, how would you tell the good raisins apart from the bad ones?

Without these answers, software supply chain automation grinds to a halt.  Or worse, components continue to be procured in a free-for-all manner, regardless of quality, security, or performance.

Mapping the JavaScript Genome.  In direct response to market demand for DevOps-native tooling, Sonatype has delivered the world’s first and only coordinate system that is capable of precisely identifying all JavaScript contained in the npm, Central, and NuGet repositories. This enormous engineering effort was accomplished by mapping tens of millions of unstructured files and components into a single, definitive database that identifies names, versions, vulnerabilities, licenses, and code modifications associated with JavaScript components.  In essence, we mapped the JavaScript genome.

Advanced Binary Fingerprinting (ABF)

HiRes-1.jpg

Sonatype calls this approach Advanced Binary Fingerprinting. It enables organizations to:

  • Empower innovation by equipping teams with the ability to precisely identify the highest quality open source components.
  • Scale fast with component intelligence that is precise enough to enable automation at every phase of the software lifecycle.
  • Control component usage with flexible policies that can promote granular decision support across varying teams, languages, and application profiles.

In this new world, trail mix and raisins are not a problem.  Every raisin -- that is, every component -- can be immediately, and precisely identified, traced, and matched with its supplier.  Software supply chains can be automated in new ways.  Quality, performance, and security attributes have already been assessed and can be made available instantly.

Suppliers and their parts can be instantly vetted.  And now, components can be indefinitely tracked.  

In the realm of JavaScript, no other firms has accomplished this feat.  In fact, those that had attempted it previously simply gave up at the challenge, claiming that it was too difficult to accomplish.  They still claim support for JavaScript, but their results are plagued with huge volumes or false positives, false negatives, or time-consuming manual analysis.

The bottom line...precision matters.  When it comes to using open source components to manufacture modern software, the bottom line is this – precise intelligence is critical for DevOps-native approaches.  Tools that lack precision cannot scale to the needs of modern software development. Inaccurate and/or incomplete data will leave organizations to deal with vulnerabilities, licensing, and other quality issues that lead directly to higher costs and reduced innovation.

With precise identification on your side, you have the power to error-proof the software supply chain. This means eliminating, with certainty, the risks and inefficiencies that diminish innovation. This also means unlocking the full potential of talented developers so you can innovate faster and compete more effectively on a global playing field.

Screen_Shot_2016-10-18_at_6.28.40_AM.png

To learn more about Advanced Binary Fingerprinting for npm, JavaScript, .NET, Python, and RubyGems delivered in our Nexus products, please visit Sonatype’s: Precise Intelligence for DevOps Automation.

Tags: Software Supply Chain, Advanced Binary Matching, Javascript, devsecops, Advanced Binary Fingerprinting (ABF)

Written by Derek Weeks

Derek serves as vice president and DevOps advocate at Sonatype and is the co-founder of All Day DevOps -- an online community of 65,000 IT professionals.