question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement PCRE style in-regex comments; e.g. (?#comment)

See original GitHub issue

The Python regex implementation we used does not appear to implement any method of having in-regex-text comments which would work in the watchlist and blacklists.1 It would be beneficial for us to be able to include comments in at least our watchlist and blacklist entries, and potentially the other regexes that we use in findspam.py. PCRE implements in-regex comments using comments like (?#comment).

It would be relatively easy for us to implement support for PCRE style regex comments. These could be implemented by just removing from the strings we convert to regexes any content which matches the regex \(\?#(?<!(?:[^\\]|^)(?:\\\\)*\\\(\?#)[^)]*\).2

This substitution could be performed at one of the following points (listed in in order of increasing generality):

  1. For watchlist and blacklists only: when we read the watchlist and blacklist lines from the files
  2. All 'regex' detections: just prior to using regex.compile() on the text provided in all the 'regex' detections, or
  3. All regexes: as a wrapper to regex.compile().

  1. There is the possibility of “Verbose” regexes using the X flag, which I assume is also available in the regex module we’re using. However, using these would not address having comments in the watchlist and blacklists.
  2. That regex is untested, as it relies on variable length look-behinds for which I don’t have a simulator/tester. The regex \(\?#(?<!^\\\(\?#)(?<![^\\]\\\(\?#)(?<!\\\\\\\(\?#)[^)]*\) is tested and correctly matches, or not, for up to 3 \ escapes prior to the `(?#comment).

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
makyencommented, Sep 16, 2018

Prior to writing this RFE, I had found non-official documentation which said these are not implemented. In addition, the official documentation I looked in didn’t mention them, while mentioning other types of comments (“verbose” regular expressions).

However, having looked in the source code for regex, it does appear that this style of comment is already implemented as a standard part of both the re and regex implementations. After finding it in the source, I also found it in the re documentation.

So, there’s no need for this RFE, as it’s already natively supported. So, sorry to waste everyone’s time.

0reactions
makyencommented, Sep 16, 2018

@quartata Assuming that python-pcre does implement just PCRE, it’s unlikely that it would be easier to use it. We currently use capabilities (e.g. variable length look-behind) which are not part of PCRE. If it was me, I’d much rather implement a single regex-replace (which is all that implementing this requires) than take on the known and unknown issues of moving to a different regex implementation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Finding Comments in Source Code Using Regular Expressions
When I'm programming, I like to use an editor with regular expression search and replace. ... more_code(); /* * Another common multi-line comment...
Read more >
Improving/Fixing a Regex for C style block comments
@Gumbo - I use this regex to check for a comment starting at a known index point, not anywhere at all in the...
Read more >
documentation - Commenting regular expressions
In my view, a good practice is to concisely state in comments what the general idea of the regular expression is. This saves...
Read more >
pcrepattern specification
There are two ways of including comments in patterns that are processed by PCRE. In both cases, the start of the comment must...
Read more >
Multiline Comments - Regular Expressions Cookbook, 2nd ...
Multiline Comments Problem You want to match a comment that starts with /* and ends with */. Nested comments are not permitted. Any...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found