New year, new CVE: A deep dive into the 'node-forge' (CVE-2022-0122)

January 25, 2022 By Juan Aguirre

5 minute read time

With over 16 Million weekly downloads, the important and widely-used "node-forge" component on npm implements key security functions, including Transport Layer Security protocol, cryptographic functions, and development tools for web apps in native JavaScript.

The program is in use by industry giants such as Cisco, Microsoft, and Alexa (yes, the beloved smart speaker that's in many people's homes), according to grep.app:

grep.app results showing the use of node-forge in Alexa projects

grep.app results showing the use of node-forge in Alexa projects

CVE-2022-0122 refers to an Open Redirect vulnerability in the parseUrl function from the utils.js file within the package. This function is in charge of taking a supplied string and validating its correctness to then forming and returning an object that is easier to manipulate and work with. To do this, it first uses a regular expression (regex) to analyze the string. Regex is supported by hundreds of applications for text and string management, and this regex code is the root cause of the vulnerability.

Irregular regex

At the beginning of the function, we can even see a comment that says "FIXME: this regex looks a bit broken." This suggests that the developers understood this was an issue but unaware of the potential for abuse.

Initial portion of the vulnerable function with a “FIXME” comment.

Initial portion of the vulnerable function with a “FIXME” comment.

To analyze the code, we'll look at one of my favorite regex analysis tools: regex101. It breaks the code down with an easily explanation, as well as letting you play around with and even debug the expression to see potential matches.

The first part of this regex tells us that the string must start with either `http` or `https`. We know this because it has the ^ symbol, which asserts the start of a line, followed by a capturing group for `https?`. The question mark, ?, tells us that the character before it, meaning the `s`, can occur 0 or 1 times. This leaves us with either `http` or `https` as valid matches for the first part.

Next, it matches the literal characters ://. This equals the rest of the URL scheme. Then, it matches a single character not present in the list (^:&^/) between zero and unlimited times, as denoted by the *. This leaves room for almost anything, which makes sense since this portion is meant to match the host. Finally, it matches a colon, :, to make way for the port number and just an unlimited number of characters, without restrictions.  This will match the path portion and end of the URL.

Regex with ideal input

Regex with ideal input

The main issue here is that this regex seems very specific to a certain type of well-formed URL, which simply isn't always the case. Especially when the input is controlled by a user. This is something that standard unit tests can miss, and we don't even have to go to the worst case: A malicious user. It can also be an issue with not ideal, misbehaving, and curious users.

Also, think of a complex webapp that wants to use this library to parse URLs. Not all URLs are as pretty as the ones we are used to seeing on our browser address bar. Some URLs, which are completely RFC-compliant, can be really funky and odd.

Untrusted redirect

After playing with this regex a bit, I noticed it really accepts anything that starts with http[s]://. Because of this tolerance, the groups aren't always properly split. This leads to the URL response object being incorrectly put together, likely with an empty or insecure host and everything else thrown in the path portion. Within this insecure host and path, everything is fair game, including all variations of slashes (pictured below).

Regex with undesired input results in an insecure path. 

Regex with undesired input results in an insecure path. 

This is what allows an Open Redirect.

So how does an attacker leverage this? Well, it greatly depends on the use the vulnerable application gives the library but ultimately it gives an attacker the ability to bypass the URL parser.

Internally within the node-forge library we see the use of the parseUrl function in 2 other functions, `createClient` and `withinCookieDomain`.

With createClient, an attacker could entice a victim into being connected to a malicious client by setting the host to an attacker controlled URL. This can be thought of as an Open Redirect, since I am able to redirect a victim to an undesired, arbitrary location. Once the victim is connected to the rogue client, a lot more can happen, including getting access to sensitive information.

For withinCookieDomain, depending on how it's used by the application in question, it could be abused to plausibly bypass authorization checks. And the possibilities and potential impacts go on if a developer uses the parseUrl function directly.

The fix

Fortunately, the developers released a fix for this in v1.0.0.

Changelog update for the fix

Changelog update for the 1.0.0 fix

The update removes the insecure function and replaces it with the WHATWG URL Standard. This underlines one of the greatest things about open source: you can easily find many stable/reliable libraries to help you build something awesome.

Implemented fix. Replacing parseUrl with WHATWG URL Standard. 

Implemented fix. Replacing parseUrl with WHATWG URL Standard. 

Another good practice is to always implement a sanitizing user input. We can never really trust a user because, whether it be due to clumsy fingers or malicious intent, user input can be dangerous. Especially when we trust it blindly. Even in cases where we’re going to pass that input into another function that implements some sort of filtering, it's never a bad idea to do your own sanitizing in-house. It's better security to check the input as it's received before processing it further.

That's not to say you should do your own implementations of known-reliable functionalities. Open source has it all, so don't reinvent the wheel if it's not necessary – just make sure it rolls the way it's supposed to.

Tags: vulnerabilities, npm, Open Source

Written by Juan Aguirre

Juan is a security researcher at Sonatype and part of the team who has helped Sonatype catalog more than 100 million open source components.