question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rule proposal: no-catastrophic-backtracking

See original GitHub issue

Please describe what the rule should do:

Flag RegExps that are vulnerable to catastrophic backtracking like the one that took down CloudFlare’s global network for an hour.

See also https://www.regular-expressions.info/catastrophic.html.

What category of rule is this? (place an “X” next to just one item)

[X] Warns about a potential error (problem) [ ] Suggests an alternate way of doing something (suggestion) [ ] Enforces code style (layout) [ ] Other (please specify:)

Provide 2-3 code examples that this rule will warn about:

/.*.*=.*/
new RegExp('.*.*=.*')
// maybe even some basic things like this?
const keyValue = '.*=.*'
new RegExp(`.*${keyValue}`)

Why should this rule be included in ESLint (instead of a plugin)?

RegExps are a fundamental and commonly-used JavaScript feature and there is a serious risk of devs writing bad ones. People would be far less likely to discover or adopt this rule (especially into popular presets) if this rule were in an external plugin, and I want as many people as possible to be able to catch unsafe RegExps before they reach production.

Also, a rule like this may have been able to catch a potential catastrophic backtracking regex that one user discovered in eslint itself 😎

Are you willing to submit a pull request to implement this rule? Yes, as long as I can depend on safe-regex to perform the actual safety check (I can write the parsing logic to find regexps, evaluate some expressions in RegExp constructors, and then pass the patterns to safe-regex).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
davisjamcommented, Oct 18, 2019

Are there known techniques (perhaps in the papers you linked) to detect quadratic-time regexes?

Yes. Any regex’s super-linear behavior can be traced to a corresponding NFA with two properties:

  1. They have an ambiguous sub-graph (multiple ways for a match).
  2. After reaching the ambiguous sub-graph, a mismatch can occur. (Well, the second property is a bit more complicated than this, but that’s the core idea.)

Wustholz and Weideman independently defined the necessary and sufficient conditions to identify these properties in 2015-2017.

  • Both analyses consider both polynomial and exponential degrees of ambiguity, and hence quadratic-or-worse behavior.
  • Their analyses depend on a model of real-world regex engines, so they sometimes have false positives due e.g. to optimizations they don’t consider. In my research I take their reports with a grain of salt and dynamically validate them in the programming language of interest.

Their analyses and implementations only apply to a subset of regex features.

Weideman’s tool is open-source here.

1reaction
davisjamcommented, Oct 17, 2019

@jedwards1211 I’m actually the maintainer of safe-regex now. The v2.0 release has my improvements (fixes some false negatives). I thought I made a PR to eslint to pick up v2.0, but maybe I misremember. Though I guess that was to update the security plugin rather than to eslint itself?

@not-an-aardvark

Unfortunately, safe-regex is not reliable for performing safety checks.

True. The documentation says that pretty clearly: “WARNING: This module has both false positives and false negatives.”

For example, it considers the regex that caused this issue to be safe.

Correct. That regex is quadratic, not exponential-time. As advertised, the star height heuristic currently used in safe-regex only detects (some) exponential-time regexes. For more on heuristics (Star Height, QOD, and QOA), see section 5.1 in this paper.

I’d love to prevent catastrophic backtracking generally, but I haven’t seen a reliable way to do it with static analysis.

  • safe-regex#27 references the two academic papers that provide static algorithms for identifying super-linear regexes.
  • My vuln-regex-detector project has wrappers for the tools from those papers as well as two other tools. My project includes some improvements to reduce false negatives based on mismatches between the academic models for regex engines and the behavior of real regex engines.

I think the best outcome would be for JS engines to use linear-time NFA matching instead of backtracking whenever possible (i.e. whenever a regex doesn’t contain features like backreferences).

This would be nice. The dicey bit is ensuring that changing the algorithm doesn’t change the semantics of the regex match. Many regex engines claim to have the same semantics but exhibit subtle differences – see section 7 of this paper (short version). Another option is to optimize the regex engines instead.

Another option, of course, is to use the RE2 bindings for JavaScript. But I know of at least one semantic difference between RE2 and Node.js’s built-in Irregexp engine, so that might be risky.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fixing Catastrophic Backtracking in Regular Expression
Using the test string V:\Sample Names\Libraries\DeveloperLib\DeveloperComDlgs\res is recognized as valid. I can even add invalid characters to ...
Read more >
Catastrophic backtracking - The Modern JavaScript Tutorial
First, the regexp engine tries to find the content of the parentheses: the number \d+ . · As there's no match, the greedy...
Read more >
Analyzing Catastrophic Backtracking Behavior in ... - arXiv
We investigate the complexity of deciding exponential backtracking matching on strings that are rejected (which we refer to as deciding exponential failure.
Read more >
Analyzing Catastrophic Backtracking Behavior in Practical ...
Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of...
Read more >
Regex Performance - Coding Horror
But you may not need a large string to cause a major performance ... The author of RegexBuddy calls this catastrophic backtracking, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found