question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

c++ generated parser doesn't support international characters

See original GitHub issue

Problem is with some international characters.

How to reproduce the problem? Download latest javacc source (master eb4455b) build javacc with ant

Build C++ JavaGrammar example cd examples/JavaGrammars/cpp make

Parse java file from HelloWorld.zip ./javaparser HelloWorld.java

Got several errors reported

error: 10
Expecting ; at: 7:35 but got (
Expecting } at: 7:35 but got (
Expecting } at: 7:35 but got (
Expecting EOF at: 7:35 but got (

Problem is with definition of JAVACC_CHAR_TYPE as char which is in C/C++ signed and comparision JAVACC_CHAR_TYPE in JavaParserTokenManager.cc else if (curChar < 128) which is always true.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
new-javacccommented, Apr 5, 2017

Yes, that’s me! 😃 I’m using the support alias for all things javacc-related.

I am not the owner of JavaCC and as such could not confirm/unconfirm your thoughts. Please ask Sreenivas Viswanadha sreeni@viswanadha.net mailto:sreeni@viswanadha.net whois the designer/owner of JavaCC

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/javacc/javacc/issues/22#issuecomment-291772642, or mute the thread https://github.com/notifications/unsubscribe-auth/ATQdrbcMxxCo9F-0s_dkweJDy5y9lyw9ks5rszyUgaJpZM4Mxqtm.

0reactions
new-javacccommented, Nov 1, 2017

Basically javacc tokenizer just gets an array of characters and works on them. So any decoding should happen outside - before it reaches the charstream. For example, in java you specify the encoding when you create the reader.On Oct 31, 2017 2:38 PM, zosrothko notifications@github.com wrote:“You are confusing between encoding and char type size. Javacc doesn’t care about encoding”… May be I missed something or misunderstood… Could you be more specific on this confusion?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

Read more comments on GitHub >

github_iconTop Results From Across the Web

C# Help reading foreign characters using StreamReader
The code below doesn't work, when the file values are read and shown in the datagrid the characters appear as squares, could there...
Read more >
Programming/Parsing FAQs - Apache XML
Full international character support. Both utf-8 and utf-16 cover the full Unicode character set, which includes all of the characters from all major...
Read more >
how to parse special characters in xml - elevateindy.org
XML processors parse these reserved characters since XML uses tree-like structures of tags and representing entities in a challenging task.
Read more >
UTF-8 - Wikipedia
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or ...
Read more >
Regular expressions - JavaScript - MDN Web Docs
Regular expressions are patterns used to match character ... or a combination of simple and special characters, such as /ab*c/ or /Chapter ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found