question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

regex doesnt work for UTF8

See original GitHub issue

I’m trying some chinese inputs, and the [*] format doesn’t seem to behave

    + [*] 你好 [*]
    - wrapping 你好

image

so as you can see using normal western code ww on the sides of the chinese characters is OK, but the [*] isn’t matching if using chinese characters, with or without a space. I also tried the rive pattern without spaces, ie

+ [*]你好[*]

although i’m not sure what the best practice is here.

FWIW normal * is matching OK:

    + 你好
    - 你-》你好<get nickname>

    + 你好 *
    - 你-》<star>

    + [*] 你好 [*]
    - wrapping 你好

image

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
kirslecommented, Jan 19, 2017

@dcsan I think I may just have to add that feature. What I’ve learned from porting RiveScript to 5 different languages is that A) Unicode is hard, and B) regular expression engines aren’t all created equally. Things that work in regexps in one language don’t work in another, and it’s hard to make RiveScript support all kinds of Unicode across all versions; so allowing the end user to write a literal regular expression can enable them to fix their specific issues their own way, and avoids all the ‘magic’ that triggerRegexp() does that might interfere with their attempt to get a working regexp out of it.

RiveScript’s predecessor supported a regexp command: everything old is new again.

0reactions
kirslecommented, Mar 10, 2017

Closing this issue in favor of tracking the ~Regexp feature in https://github.com/aichaos/rivescript-wd/issues/6

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regex to detect invalid UTF-8 string - Stack Overflow
If the regex matches, the string contains invalid byte sequences. It's 100% portable because it doesn't rely on PCRE_UTF8 to be compiled in....
Read more >
How to validate UTF-8 in regex - Salesforce Stack Exchange
I've tried some simplified ideas based on a regex tutorial and the Java Docs Salesforce links to, but they do not work: NOT(...
Read more >
Regex Pitfalls: Mixing Unicode and 8-bit Character Codes
Using 8-bit character codes like \x80 with a Unicode regular expression will likely give you unexpected results.
Read more >
Regexp fails to match UTF-8 characters in Notepad++
Turns out that NP++ has problems when searching for Unicode characters outside the Basic Multilingual plane (BMP) that have a code-point ...
Read more >
Regular Expression Inconsistencies With Unicode
Regular Expression Inconsistencies With Unicode ... A mud run ... On my personal computer with GNU grep 3.1, \w doesn't work at all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found