question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parsed emojis' lengths aren't calculated consistently

See original GitHub issue

Describe the bug After parsing an html page that uses unicode characters like emojjis, the length of the parsed unicode character is incorrect.

To Reproduce Steps to reproduce the behavior:

  1. Go to this repro repo
  2. Clone and install the dependencies
  3. run node test.js
  4. Observe error

The repro case takes the following html as input:

📚<div href="./123/123">hey there</div>

and aims to replace the value of href with another string: 234.

The expected output would be :

📚<div href="234">hey there</div>

instead, the output is:

📚<div href=2343">hey there</div>

I don’t know specifically why this is the case, but my gut reaction is that the sax-parser doesn’t recognise unicode characters like 📚 may have a length greater than 1. Because this particular emoji has a length of 2, it’s causing the replaceBetween function to incorrectly calculate where to replace the string.

Additional context This is made more clear when

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
daKmoRcommented, Apr 8, 2022

@justinwilaby yes that fixed it 🎉

you are an amazing maintainer 🤗 friendly, correct, and fast 💪

is there a way we can buy you a coffee or so? ☕

1reaction
justinwilabycommented, Apr 7, 2022

@andrico1234 - please confirm that #58 resolves your issue. If so, I’ll merge and publish a patch.

Thank you for the bug report!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Emoji.length == 2 | Hacker News
So an emoji can have length 1 or 2 in UTF-16. However, when moving to the database it will typically be stored in...
Read more >
Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in ...
Since you aren't comparing against emoji containing zero-width joiners, the method won't find a match for any but the last character.
Read more >
Everything You Need To Know About Emoji
You won't find any emoji in that list, for one thing. But while there are not always named references, there are always numeric...
Read more >
Emoji - PyMdown Extensions Documentation
So the value here will not always be practical for calculating the actual Unicode points of an emoji. This will be None for...
Read more >
Emoji as a Proxy of Emotional Communication - IntechOpen
Emoji allows people expressing more “authentically” emotions and their ... where $K$ is the one-hot vector's length that represents each emoji xi ....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found