how to support unicode/utf-8? like Chinese?
See original GitHub issueSample code like this:
parser = Lark('''start: WORD "," WORD "!"
%import common.WORD // imports from terminal library
%ignore " " // Disregard spaces in text
''', parser='lalr')
print(parser.parse("Hello,世界!"))
I’m already try a long time, anyone can help on this, thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Unicode/UTF-8 characters (Chinese characters, Barcodes, etc ...
Navigate to the location of the CSV file you want to import. Choose the Delimited option. Set the character encoding File Origin to...
Read more >What is UTF-8? UTF-8 Character Encoding Tutorial
It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid ...
Read more >Are Chinese characters UTF 8? - Quora
If you receive a Chinese text whith a lot of strange symbols you can try to change the character encoding to Big5 or...
Read more >utf 8 - What is the encoding of Chinese characters on Wikipedia?
UTF8 implements unicode, and in unicode, each character has a codepoint, that is between 0x4E00 and 0x9FFF (2 bytes) for all chinese ......
Read more >FAQ - Chinese and Japanese - Unicode
Q: I have heard that UTF-8 does not support some Japanese characters. ... around about the support of Chinese, Japanese and Korean (CJK)...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I find out how to code the rule for Unicode in grammer.g, here is the example
and it outputs:
/[u"\u4e00-\u9fa5"]/
will include the quote marks. You can use/[\u4e00-\u9fa5]/
instead