Parsed emojis' lengths aren't calculated consistently
See original GitHub issueDescribe the bug After parsing an html page that uses unicode characters like emojjis, the length of the parsed unicode character is incorrect.
To Reproduce Steps to reproduce the behavior:
- Go to this repro repo
- Clone and install the dependencies
- run
node test.js
- Observe error
The repro case takes the following html as input:
📚<div href="./123/123">hey there</div>
and aims to replace the value of href
with another string: 234
.
The expected output would be :
📚<div href="234">hey there</div>
instead, the output is:
📚<div href=2343">hey there</div>
I don’t know specifically why this is the case, but my gut reaction is that the sax-parser doesn’t recognise unicode characters like 📚
may have a length greater than 1. Because this particular emoji has a length of 2, it’s causing the replaceBetween
function to incorrectly calculate where to replace the string.
Additional context This is made more clear when
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Emoji.length == 2 | Hacker News
So an emoji can have length 1 or 2 in UTF-16. However, when moving to the database it will typically be stored in...
Read more >Why are emoji characters like 👩👩👧👦 treated so strangely in ...
Since you aren't comparing against emoji containing zero-width joiners, the method won't find a match for any but the last character.
Read more >Everything You Need To Know About Emoji
You won't find any emoji in that list, for one thing. But while there are not always named references, there are always numeric...
Read more >Emoji - PyMdown Extensions Documentation
So the value here will not always be practical for calculating the actual Unicode points of an emoji. This will be None for...
Read more >Emoji as a Proxy of Emotional Communication - IntechOpen
Emoji allows people expressing more “authentically” emotions and their ... where $K$ is the one-hot vector's length that represents each emoji xi ....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@justinwilaby yes that fixed it 🎉
you are an amazing maintainer 🤗 friendly, correct, and fast 💪
is there a way we can buy you a coffee or so? ☕
@andrico1234 - please confirm that #58 resolves your issue. If so, I’ll merge and publish a patch.
Thank you for the bug report!