question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how to support unicode/utf-8? like Chinese?

See original GitHub issue

Sample code like this:

parser = Lark('''start: WORD "," WORD "!"
            %import common.WORD   // imports from terminal library
            %ignore " "           // Disregard spaces in text
         ''', parser='lalr')
print(parser.parse("Hello,世界!"))

I’m already try a long time, anyone can help on this, thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
ray-linncommented, Aug 10, 2018

I find out how to code the rule for Unicode in grammer.g, here is the example

LCASE_LETTER: "a".."z"
UCASE_LETTER: "A".."Z"
CN_ZH_LETTER: /[u"\u4e00-\u9fa5"]/
LETTER: UCASE_LETTER | LCASE_LETTER | CN_ZH_LETTER
WORD: LETTER+

and it outputs:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, '世界')])
0reactions
ruiqurmcommented, Dec 4, 2021

I find out how to code the rule for Unicode in grammer.g, here is the example

LCASE_LETTER: "a".."z"
UCASE_LETTER: "A".."Z"
CN_ZH_LETTER: /[u"\u4e00-\u9fa5"]/
LETTER: UCASE_LETTER | LCASE_LETTER | CN_ZH_LETTER
WORD: LETTER+

and it outputs:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, '世界')])

/[u"\u4e00-\u9fa5"]/ will include the quote marks. You can use /[\u4e00-\u9fa5]/ instead

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unicode/UTF-8 characters (Chinese characters, Barcodes, etc ...
Navigate to the location of the CSV file you want to import. Choose the Delimited option. Set the character encoding File Origin to...
Read more >
What is UTF-8? UTF-8 Character Encoding Tutorial
It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid ...
Read more >
Are Chinese characters UTF 8? - Quora
If you receive a Chinese text whith a lot of strange symbols you can try to change the character encoding to Big5 or...
Read more >
utf 8 - What is the encoding of Chinese characters on Wikipedia?
UTF8 implements unicode, and in unicode, each character has a codepoint, that is between 0x4E00 and 0x9FFF (2 bytes) for all chinese ......
Read more >
FAQ - Chinese and Japanese - Unicode
Q: I have heard that UTF-8 does not support some Japanese characters. ... around about the support of Chinese, Japanese and Korean (CJK)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found