question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Case sensitivity handling and note about in the docs

See original GitHub issue

Hi! First of all, I’d like to thank You for creating such fast lexer. I’ve been using it along with nearley.js in various projects. It really changed my way of approaching any text parsing related topics.

I’m currently working on a language that requires all tokens to be case-insensitive, without exceptions. For now, following tips that I’ve found over the internet (and issues in this repo), I’ve been using some custom helpers that transform token text into case insensitive regex without the /i flag. This works, however it’s not pretty. Also, even if unreal, I have doubts about the overall performance of my parser.

Why I’m creating this issue? I would like a concise description on how to approach situations where all (or some) tokens are case-insensitive. An example would be nice as well.

Let’s say, that my lexer usage looks like this:

import moo from 'moo';

const lexer = ({

  /* This doesn't care about case sensitivity. */
  STRING: /"(?:[^\\]|\\.)*?"/,

  /* Case sensitivity doesn't apply here. */
  NUMBER: /(?:\.\d+|\d+\.?\d*)/,

  /* Case sensitivity doesn't apply here. */
  ADD: '+',

  /* Manually force case insensitivity */
  IN: ['in', 'iN', 'In', 'IN'],

  /* Use a helper */
  ABS: textToCaseInsensitiveRegex('ABS')
});

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:9

github_iconTop GitHub Comments

1reaction
autiochcommented, Jan 21, 2019

Hi again. After some testing and pondering about my code readability, I’ve came to a conclusion that I’ll stick to the textToCaseInsensitiveRegex helper. I’m doing some extra transformations on the lexer rules, so I’ve added “precompiler”, that outputs complete, finished definitions, that I later use in the app. Separating these two things is safer, easier to write tests, debug and finally reduces time for parsing and preparing the JS on the client side.

Helper:

const LETTER_REGEXP = /[a-zA-Z]/;
const isCharLetter= (char) => LETTER_REGEXP.test(char);

function textToCaseInsensitiveRegex(text) {
  const regexSource = text.split('').map((char) => {
    if (isCharLetter(char)) {
      return `[${char.toLowerCase()}${char.toUpperCase()}]`;
    }

    return char;
  });

  return new RegExp(regexSource.join(''));
};

As a side note, it’s cool that moo accepts array as an alternative to object. It’s easier to manipulate and there’s complete certainty about the rules order.

0reactions
tjvrcommented, Jun 4, 2021

Like the unicode flag handling in #123, I think it would be reasonable to allow the ignoreCase /i flag if all the RegExps use it. That would handle the case where everything in the language is case-insensitive.

If only some of the RegExps need to be case-insensitive, then you’ll have to generate the cases manually, using something like textToCaseInsensitiveRegex above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Case sensitivity (Reference) - Prisma
Case sensitivity affects filtering and sorting of data, and is determined by your database collation. Sorting and filtering data yields different results ...
Read more >
Case sensitivity - Wikipedia
In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct (case-sensitive) or equivalent (case-insensitive).
Read more >
Case Sensitivity - VMware Docs
Query language keywords such as SELECT, NULL, DATE, and <TRACE> are case-insensitive. Identifiers such as attribute names, method names, and path expressions ...
Read more >
Case Sensitivity | Geode Docs
Query language keywords such as SELECT, NULL, DATE, and <TRACE> are case-insensitive. Identifiers such as attribute names, method names, ...
Read more >
CWE-178: Improper Handling of Case Sensitivity
CWE-178: Improper Handling of Case Sensitivity · case-insensitive passwords reducing the size of the key space, making brute force attacks easier · bypassing ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found