Better unsafe regex detector
See original GitHub issueHi all,
I’m a systems/security researcher at Virginia Tech and have been studying the incidence of vulnerable regexes in the wild.
This plugin’s unsafe regex detector relies on safe-regex, which uses star height (nested quantifiers) to identify unsafe regexes.
Pros:
- safe-regex is fast.
- safe-regex is an npm module which makes it easy to work with.
- safe-regex has no non-JS dependencies.
As a result, safe-regex is great for CI use cases.
Cons:
- safe-regex is incorrectly implemented and substack is not maintaining it.
- safe-regex has lots of false positives (e.g.
(ab+)+
). - safe-regex will only identify one type of exponential-time vulnerability, and ignores all polynomial-time vulnerabilities. In my research I found that, in the wild, polynomial-time vulnerabilities are far more common than exp-time vulnerabilities.
There are some alternatives to safe-regex that report exploit strings so you can tell if they’re correct or not.
- Rathnayake’s rxxr2. Like safe-regex, this only checks for star height-style vulnerabilities. But it doesn’t have false positives as far as I can tell.
- Wustholz’s REXPLOITER. This tests star height and other exp-time vulnerabilities, plus poly-time vulnerabilities.
- Weideman’s RegexStaticAnalysis. Like Wustholz’s REXPLOITER, but open-source and it works better.
Unfortunately:
- These alternatives all have non-JS dependencies (e.g. OCaml or Java) and have inconsistent interfaces.
- Some (especially Weideman) can take minutes to test a single regex.
My project vuln-regex-detector provides a convenient wrapper for these alternatives, and enforces time and memory limits to get results or fail relatively quickly.
However, I’d be surprised if developers were willing to wait even 30 seconds for linting. To address that, I’m nearly done implementing a server side so queries can be answered by hitting the server for a pre-computed answer instead of doing the expensive computation locally. The server processes not-seen-before queries in the background so subsequent queries will get a real answer.
Once that’s done, would you folks be interested in hitting my server first and falling back to safe-regex
if my server hasn’t seen the query before? I’ve got a sample client that can be used with a one-line tweak for this use case.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:12 (6 by maintainers)
Top GitHub Comments
The analysis could be written in JavaScript, and I would happily incorporate it into safe-regex. I can point anyone interested in this in the right direction.
On Wed, Sep 11, 2019, 9:51 PM Matthew Herbst notifications@github.com wrote:
The server-side code is available, but would have to be run on the user’s side.
Unfortunately the existing advanced analyses are written in Java and OCaml, not JS.
I have half a million labeled regexes, so I suppose another option is to ship this database in
safe-regex
in compressed form as a “cache” of sorts.