question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Activating case insensitive option crash the lexer generator

See original GitHub issue

Description

When trying to generate the lexer using the command line, I get a crash with an error message :

$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt -i --unicode -o Lexer.fs
compiling to dfas (can take a while...)
FSLEX: error FSL000: System.Collections.Generic.KeyNotFoundException: The given key '4294967040' was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at FsLexYacc.FsLex.AST.CompileRegexp@187(FSharpMap`2 macros, NfaNodeMap nfaNodeMap, Regexp re, NfaNode dest) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 195
   at FsLexYacc.FsLex.AST.trs@190-1.Invoke(Regexp re) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 190
   at Microsoft.FSharp.Primitives.Basics.List.mapToFreshConsTail[a,b](FSharpList`1 cons, FSharpFunc`2 f, FSharpList`1 x)
   at Microsoft.FSharp.Primitives.Basics.List.map[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x)
   at FsLexYacc.FsLex.AST.CompileRegexp@187(FSharpMap`2 macros, NfaNodeMap nfaNodeMap, Regexp re, NfaNode dest) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 190
   at FsLexYacc.FsLex.AST.trs@262.Invoke(Int32 n, Tuple`2 x) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 262
   at Microsoft.FSharp.Primitives.Basics.List.mapiToFreshConsTail[a,b](FSharpList`1 cons, FSharpFunc`3 f, FSharpList`1 x, Int32 i)
   at Microsoft.FSharp.Primitives.Basics.List.mapi[T,TResult](FSharpFunc`2 f, FSharpList`1 x)
   at Microsoft.FSharp.Collections.ListModule.MapIndexed[T,TResult](FSharpFunc`2 mapping, FSharpList`1 list)
   at FsLexYacc.FsLex.AST.LexerStateToNfa(FSharpMap`2 macros, FSharpList`1 clauses) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 262
   at FsLexYacc.FsLex.AST.Compile@391-1.Invoke(Tuple`2 tupledArg) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 392
   at FsLexYacc.FsLex.Driver.main() in /Users/sergey/github/FsLexYacc/src/FsLex/fslex.fs:line 90

If I remove the “-i” option, it passes :

$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt --unicode -o Lexer.fs
compiling to dfas (can take a while...)
7 states
writing output

When trying to remove the “–unicode” option it says I should use it :

$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt -i -o Lexer.fs
compiling to dfas (can take a while...)
FSLEX: error FSL000: the Unicode character '178' may not be used unless --unicode is specified

This is strange because I don’t see any unicode characters in my text, even “file” says so :

$ file Lexer_fail_option_i.txt
Lexer_fail_option_i.txt: ASCII text

Thanks!

Repro steps

Use this file to reproduce :

Lexer_fail_option_i.txt

Known workarounds

It seems to work if I don’t activate the case insensitive option.

Related information

  • Windows 10
  • Version 10.2.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
gdziadkiewiczcommented, Jan 9, 2022

Hi, I found time to fix and motivation to fix this. I did a check of the Unicode categories and it looks like it’s simpler than I expected. I plan to go with option 3 and the mapping of the Unicode categories will transform each of the three cased categories into all three of the cased categories.

While this is not hard to do currently there are no working Unicode tests and I decided to change that first by resurrecting the legacy Unicode tests (and test2 while at it, and adding the new Expecto test projects to the pipeline). So I need to first finish #157 and get it merged. After that expect a PR that will add additional Unicode+caseInsensitive tests and a fix for this issue.

0reactions
gdziadkiewiczcommented, Jan 28, 2022

Case-insensitivity implementation assumes that upper and lower case letters are always contained together in the code page. This is not true for example for 0xFF for which the upper case letter is 0x178: Screenshot 2022-01-28 011116 It affects | _ rule from the attached test case and will probably also affect | [^ charset] rules and literal use of the problematic letters.

Read more comments on GitHub >

github_iconTop Results From Across the Web

FsLexYacc/RELEASE_NOTES.md at master
Generate signature files for transformed files in fslex. ... Migration to net6.0 #166; Fix Activating case insensitive option crash the lexer generator #141 ......
Read more >
16.2 Options Affecting Scanner Behavior
instructs flex to generate a case-insensitive scanner. The case of letters given in the flex input patterns will be ignored, and tokens in...
Read more >
LALRPOP's lexer generator
In fact, when LALRPOP generates a parser, it always works in a two-phase process. ... As an example, some languages have case-insensitive keywords; ......
Read more >
Making some parser rules case-insensitive in antlr
The standard way for case-insensitivity is ... You can use option { caseSensitive=false } to maintain a case insensitive lexer.
Read more >
RE/flex user guide
-i , −−case-insensitive. This option ignores case in patterns. Patterns match lower and upper case letters in the ASCII range only.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found