Activating case insensitive option crash the lexer generator
See original GitHub issueDescription
When trying to generate the lexer using the command line, I get a crash with an error message :
$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt -i --unicode -o Lexer.fs
compiling to dfas (can take a while...)
FSLEX: error FSL000: System.Collections.Generic.KeyNotFoundException: The given key '4294967040' was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at FsLexYacc.FsLex.AST.CompileRegexp@187(FSharpMap`2 macros, NfaNodeMap nfaNodeMap, Regexp re, NfaNode dest) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 195
at FsLexYacc.FsLex.AST.trs@190-1.Invoke(Regexp re) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 190
at Microsoft.FSharp.Primitives.Basics.List.mapToFreshConsTail[a,b](FSharpList`1 cons, FSharpFunc`2 f, FSharpList`1 x)
at Microsoft.FSharp.Primitives.Basics.List.map[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x)
at FsLexYacc.FsLex.AST.CompileRegexp@187(FSharpMap`2 macros, NfaNodeMap nfaNodeMap, Regexp re, NfaNode dest) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 190
at FsLexYacc.FsLex.AST.trs@262.Invoke(Int32 n, Tuple`2 x) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 262
at Microsoft.FSharp.Primitives.Basics.List.mapiToFreshConsTail[a,b](FSharpList`1 cons, FSharpFunc`3 f, FSharpList`1 x, Int32 i)
at Microsoft.FSharp.Primitives.Basics.List.mapi[T,TResult](FSharpFunc`2 f, FSharpList`1 x)
at Microsoft.FSharp.Collections.ListModule.MapIndexed[T,TResult](FSharpFunc`2 mapping, FSharpList`1 list)
at FsLexYacc.FsLex.AST.LexerStateToNfa(FSharpMap`2 macros, FSharpList`1 clauses) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 262
at FsLexYacc.FsLex.AST.Compile@391-1.Invoke(Tuple`2 tupledArg) in /Users/sergey/github/FsLexYacc/src/FsLex/fslexast.fs:line 392
at FsLexYacc.FsLex.Driver.main() in /Users/sergey/github/FsLexYacc/src/FsLex/fslex.fs:line 90
If I remove the “-i” option, it passes :
$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt --unicode -o Lexer.fs
compiling to dfas (can take a while...)
7 states
writing output
When trying to remove the “–unicode” option it says I should use it :
$ dotnet ../packages/FsLexYacc.10.2.0/build/fslex/netcoreapp3.1/fslex.dll Lexer_fail_option_i.txt -i -o Lexer.fs
compiling to dfas (can take a while...)
FSLEX: error FSL000: the Unicode character '178' may not be used unless --unicode is specified
This is strange because I don’t see any unicode characters in my text, even “file” says so :
$ file Lexer_fail_option_i.txt
Lexer_fail_option_i.txt: ASCII text
Thanks!
Repro steps
Use this file to reproduce :
Known workarounds
It seems to work if I don’t activate the case insensitive option.
Related information
- Windows 10
- Version 10.2.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
FsLexYacc/RELEASE_NOTES.md at master
Generate signature files for transformed files in fslex. ... Migration to net6.0 #166; Fix Activating case insensitive option crash the lexer generator #141 ......
Read more >16.2 Options Affecting Scanner Behavior
instructs flex to generate a case-insensitive scanner. The case of letters given in the flex input patterns will be ignored, and tokens in...
Read more >LALRPOP's lexer generator
In fact, when LALRPOP generates a parser, it always works in a two-phase process. ... As an example, some languages have case-insensitive keywords; ......
Read more >Making some parser rules case-insensitive in antlr
The standard way for case-insensitivity is ... You can use option { caseSensitive=false } to maintain a case insensitive lexer.
Read more >RE/flex user guide
-i , −−case-insensitive. This option ignores case in patterns. Patterns match lower and upper case letters in the ASCII range only.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I found time to fix and motivation to fix this. I did a check of the Unicode categories and it looks like it’s simpler than I expected. I plan to go with option 3 and the mapping of the Unicode categories will transform each of the three cased categories into all three of the cased categories.
While this is not hard to do currently there are no working Unicode tests and I decided to change that first by resurrecting the legacy Unicode tests (and test2 while at it, and adding the new Expecto test projects to the pipeline). So I need to first finish #157 and get it merged. After that expect a PR that will add additional Unicode+caseInsensitive tests and a fix for this issue.
Case-insensitivity implementation assumes that upper and lower case letters are always contained together in the code page. This is not true for example for 0xFF for which the upper case letter is 0x178: It affects
| _
rule from the attached test case and will probably also affect| [^ charset]
rules and literal use of the problematic letters.