question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Offering my REDOS tools

See original GitHub issue

Summary

I’m a PhD student at Virginia Tech. I’ve just finished polishing up a release of the tools I used in a recent project studying the incidence of REDOS in practice. I’d like to suggest that these tools should be used during CI in any server-side module or application.

Motivation

To give a sense of why we should think about this as a best practice, let me summarize what I found in my study:

  • Two vulnerabilities in Node v4 (vuln 1, vuln 2).
  • Three vulnerabilities in Python core (vulns 1 and 2, vuln 3)
  • A bunch more in npm and pypi (e.g. affecting Marked, Hapi, Django, and MongoDB).

I’ve got an academic paper under submission with all the gory details.

Tools

Here are the tools I’ve developed:

  • General repo (includes detectors and a bin/* for testing a regex/file/tree/URL).
  • The vuln-regex-detector npm module to test a regex. This is a “dumb client” and queries a server, see below.
  • The eslint-plugin-vuln-regex-detector “eslint plugin”. Uses the vuln-regex-detector module. This “plugin” is really intended for use as a separate CI stage. I just found eslint’s infrastructure useful.

Infrastructure

I’m hosting a server at toybox.cs.vt.edu:8000 that answers queries from the npm modules. The server is documented here.

What I’d like to see

I envision adding this to package.json of any module that might be used server-side (potential REDOS):

"test:regex": "eslint --plugin vuln-regex-detector --rule '\"vuln-regex-detector/no-vuln-regex\": 2' FILES_YOU_CARE_ABOUT"

and including it during the CI process. Then REDOS issues would be caught by Travis instead of by @ChALkeR.

Issues

While I would love to see these tools adopted by everyone, there are a few potential hiccups if a lot of people start using it:

  • The server does not support batched queries, so users have to run separate queries for every regex. This may be a performance issue for clients, especially those with many regexes (like useragent modules).
  • The server validates results with a single thread. Some regexes take awhile to validate (performance issue); malicious clients can deliberately craft hard-to-validate regexes (DoS concern).
  • The server is currently a single desktop here at Virginia Tech. It might not scale well.

I welcome PRs and suggestions/help with DevOps.

In addition, there are some “whodunnit” questions that might arise if a project starts using these tools during CI. See this discussion, especially this point by @styfle. One concern is that incorporating this into CI might accidentally disclose previously-undiscovered REDOS issues that exist in production.

Documentation

I’ve tried to be thorough in the documentation, but let me know if anything is confusing or seems wrong.

How is this different from safe-regex?

Existing safe regex detectors (e.g. eslint-plugin-security’s detect-unsafe-regex rule) all rely on the safe-regex module.

The safe-regex module uses “star height” (nested quantifiers, e.g. /(a+)+/) as a heuristic to identify vulnerable regexes.

Unfortunately, safe-regex has three issues:

  1. It’s unmaintained but implemented incorrectly.
  2. It is prone to false positives. all regexes with star height > 1 are vulnerable. In my research I found that, in practice, most aren’t.
  3. More critically, it is also prone to false negatives. In my research I found that most regexes that suffer from catastrophic backtracking do not have star height > 1.

My project only uses detectors that suggest an attack string, and it confirms that the attack string triggers catastrophic backtracking in the language you request (JS-Node, Perl, PHP, Ruby, Python). Look ma, no false positives! But note that if the regex in question is not reachable by client input, then it’s a possible time bomb but not a current REDOS problem.

My project may have false negatives:

  • If you extract regexes statically (like the eslint plugin does) then any dynamically-declared regexes (e.g. new RegExp(p)) won’t be tested.
  • If the detectors can’t handle the regex, then the server currently marks the regex as safe.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:15
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mikesamuelcommented, Dec 4, 2018
1reaction
mikesamuelcommented, Dec 4, 2018

@davisjam

Given a source tree, /.../ style RegExps are easy to find but I wonder if we could get dynamically created RegExps in test-covered code to your analyzer.

If a project uses mocha to run tests, might a wrapper around mocha do something like the below before running tests so that a posttest script could fire up your analyzer:

const { console, RegExp: builinRegExp, Reflect } = global;

function onNewRegExp(re) {
  // Instead dump pattern, flags, and call stack to a file for later analysis.
  console.log(`Constructed regexp ${ re.source }`);
}

const LoggingRegExp = new Proxy(
    RegExp,
    {
        apply(...args) {
            const re = Reflect.apply(...args);
            onNewRegExp(re);
            return re;
        },
        construct(...args) {
            const re = Reflect.construct(...args);
            onNewRegExp(re);
            return re;
        },
    });

global.RegExp = LoggingRegExp;

Using a proxy makes it transparent to instanceof so that things like the following behave the same regardless of whether it’s installed.

const a = new RegExp('^a*$');
const b = /^a*$/;
const c = RegExp('^a*$');

class MyRegExp extends RegExp {
    constructor(...args) {
        super(...args);
    }
}

const d = new MyRegExp('^a*$');

console.log(`a=${ a } : ${ a instanceof RegExp }`);
console.log(`b=${ b } : ${ b instanceof RegExp }`);
console.log(`c=${ c } : ${ c instanceof RegExp }`);
console.log(`d=${ d } : ${ d instanceof RegExp }, ${ d instanceof MyRegExp }`);
console.log(a.constructor === b.constructor);
console.log(c.constructor === b.constructor);
console.log(d.constructor === MyRegExp);

This won’t cover regular expressions created by attacker controlled strings.

It also won’t cover builtins that use the builtin RegExp under the hood like "string".match('[str]*') but they could be with more instrumenting.

@MarcinHoppe fyi.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regular expression Denial of Service - ReDoS
Tool for detecting ReDoS vulnerabilities. Examples of ReDoS in open source applications: ReDoS in DataVault · ReDoS in EntLib · ReDoS in NASD...
Read more >
The Regular Expression Denial of Service (ReDoS) cheat-sheet
A regular expression (regex) is a tool that your engineering team uses to manipulate strings. They probably use it to impose some kind...
Read more >
Preventing Regular Expression Denial of Service (ReDoS)
If the server only uses regexes that are hard-coded in your application, then you can prevent regex-based denial of service attacks entirely. You...
Read more >
Regular Expression Denial of Service (ReDoS) and ... - Snyk
There have been a few attempts at creating tools for automatically detecting expressions susceptible to ReDoS attacks (most notably safe-regex) ...
Read more >
How to protect against regex denial-of-service (ReDoS) attacks
First, we need to install a tool called gnomon, a command-line utility that we'll use to examine how long a command takes to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found