regex doesnt work for UTF8
See original GitHub issueI’m trying some chinese inputs, and the [*]
format doesn’t seem to behave
+ [*] 你好 [*]
- wrapping 你好
so as you can see using normal western code ww
on the sides of the chinese characters is OK, but the [*] isn’t matching if using chinese characters, with or without a space. I also tried the rive pattern without spaces, ie
+ [*]你好[*]
although i’m not sure what the best practice is here.
FWIW normal * is matching OK:
+ 你好
- 你-》你好<get nickname>
+ 你好 *
- 你-》<star>
+ [*] 你好 [*]
- wrapping 你好
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
Regex to detect invalid UTF-8 string - Stack Overflow
If the regex matches, the string contains invalid byte sequences. It's 100% portable because it doesn't rely on PCRE_UTF8 to be compiled in....
Read more >How to validate UTF-8 in regex - Salesforce Stack Exchange
I've tried some simplified ideas based on a regex tutorial and the Java Docs Salesforce links to, but they do not work: NOT(...
Read more >Regex Pitfalls: Mixing Unicode and 8-bit Character Codes
Using 8-bit character codes like \x80 with a Unicode regular expression will likely give you unexpected results.
Read more >Regexp fails to match UTF-8 characters in Notepad++
Turns out that NP++ has problems when searching for Unicode characters outside the Basic Multilingual plane (BMP) that have a code-point ...
Read more >Regular Expression Inconsistencies With Unicode
Regular Expression Inconsistencies With Unicode ... A mud run ... On my personal computer with GNU grep 3.1, \w doesn't work at all...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@dcsan I think I may just have to add that feature. What I’ve learned from porting RiveScript to 5 different languages is that A) Unicode is hard, and B) regular expression engines aren’t all created equally. Things that work in regexps in one language don’t work in another, and it’s hard to make RiveScript support all kinds of Unicode across all versions; so allowing the end user to write a literal regular expression can enable them to fix their specific issues their own way, and avoids all the ‘magic’ that
triggerRegexp()
does that might interfere with their attempt to get a working regexp out of it.RiveScript’s predecessor supported a regexp command: everything old is new again.
Closing this issue in favor of tracking the
~Regexp
feature in https://github.com/aichaos/rivescript-wd/issues/6