Unicode in input string is not handled
See original GitHub issue/👍/u
parses differently to
/\u{1f44d}/u
The first is becoming 2 chars \ud83d
and \udc4d
.
I might try and detect any unicode in the input string and error out if that’s the case, but wondering if this lib can handle both the above the same, or maybe error?
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Why is my unicode string not being handled (printed/copied ...
I'm trying to make a program that can work with unicode strings, eventually being able to copy them to the Windows clipboard. I...
Read more >What are best practices for handling user Unicode in a web ...
1 Answer 1 · If accepting UTF, raise an error if the input has any illegal byte sequences or non-shortest-form UTF-8 characters. ·...
Read more >"Input string was not in a correct format" - MSDN
Input string was not in a correct format. Description: An unhandled exception occurred during the execution of the current web request.
Read more >C# - Unicode characters in string input tensors not translated ...
Describe the bug. An input string tensor, containing unicode chars, gets translated to an unexpected value. Urgency Bug. System information.
Read more >Unicode handling — CKAN 2.9.7 documentation
Note that the type of the pattern string does not influence the return type. Filenames¶. Like all other strings, filenames should be stored...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The
u
flag indeed acts as an opt-in to using code points as the character boundary, instead of UCS-2/UTF-16 code units (without theu
flag). I wrote about that here along with some examples: https://mathiasbynens.be/notes/es6-unicode-regex@tjenkinson thanks for the report and investigation, I think the change looks reasonable. @mathiasbynens, what are your thoughts on this?
Also, yes, when
u
is not enabled, thecharCodeAt
might be a good alternative.