Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Better unsafe regex detector

See original GitHub issue

Hi all,

I’m a systems/security researcher at Virginia Tech and have been studying the incidence of vulnerable regexes in the wild.

This plugin’s unsafe regex detector relies on safe-regex, which uses star height (nested quantifiers) to identify unsafe regexes.

Pros:

safe-regex is fast.
safe-regex is an npm module which makes it easy to work with.
safe-regex has no non-JS dependencies.

As a result, safe-regex is great for CI use cases.

Cons:

safe-regex is incorrectly implemented and substack is not maintaining it.
safe-regex has lots of false positives (e.g. (ab+)+).
safe-regex will only identify one type of exponential-time vulnerability, and ignores all polynomial-time vulnerabilities. In my research I found that, in the wild, polynomial-time vulnerabilities are far more common than exp-time vulnerabilities.

There are some alternatives to safe-regex that report exploit strings so you can tell if they’re correct or not.

Rathnayake’s rxxr2. Like safe-regex, this only checks for star height-style vulnerabilities. But it doesn’t have false positives as far as I can tell.
Wustholz’s REXPLOITER. This tests star height and other exp-time vulnerabilities, plus poly-time vulnerabilities.
Weideman’s RegexStaticAnalysis. Like Wustholz’s REXPLOITER, but open-source and it works better.

Unfortunately:

These alternatives all have non-JS dependencies (e.g. OCaml or Java) and have inconsistent interfaces.
Some (especially Weideman) can take minutes to test a single regex.

My project vuln-regex-detector provides a convenient wrapper for these alternatives, and enforces time and memory limits to get results or fail relatively quickly.

However, I’d be surprised if developers were willing to wait even 30 seconds for linting. To address that, I’m nearly done implementing a server side so queries can be answered by hitting the server for a pre-computed answer instead of doing the expensive computation locally. The server processes not-seen-before queries in the background so subsequent queries will get a real answer.

Once that’s done, would you folks be interested in hitting my server first and falling back to safe-regex if my server hasn’t seen the query before? I’ve got a sample client that can be used with a one-line tweak for this use case.

Issue Analytics

State:
Created 5 years ago
Reactions:4
Comments:12 (6 by maintainers)

Top GitHub Comments

2reactions

davisjamcommented, Sep 12, 2019

The analysis could be written in JavaScript, and I would happily incorporate it into safe-regex. I can point anyone interested in this in the right direction.

On Wed, Sep 11, 2019, 9:51 PM Matthew Herbst notifications@github.com wrote:

@davisjam https://github.com/davisjam is there a technical reason preventing the tools from being written in JS, or has that work just not been done by anyone?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodesecurity/eslint-plugin-security/issues/28?email_source=notifications&email_token=AFOD3L2QPZXEZ2NYGR73E3LQJGOCNA5CNFSM4EYDDY7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6QMV4Q#issuecomment-530631410, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOD3L2EY7V6EJKQCHBZAO3QJGOCNANCNFSM4EYDDY7A .

2reactions

davisjamcommented, Sep 11, 2019

The server-side code is available, but would have to be run on the user’s side.

Unfortunately the existing advanced analyses are written in Java and OCaml, not JS.

I have half a million labeled regexes, so I suppose another option is to ship this database in safe-regex in compressed form as a “cache” of sorts.