question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Library throws "URI malformed" error when creating patch with emojis

See original GitHub issue

The following code:

const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();

const patchText = dmp.patch_toText(dmp.patch_make('', '👨‍🦰 👨🏿‍🦰 👨‍🦱 👨🏿‍🦱 🦹🏿‍♂️'));
const patchObj = dmp.patch_fromText(patchText);
const [patchedText] = dmp.patch_apply(patchObj, '');
dmp.patch_toText(dmp.patch_make(patchedText, '👾 🙇 💁 🙅 🙆 🙋 🙎 🙍'));

Will throw an error “URI Malformed” at this line. That’s often the problem when using encodeURI on arbitrary data (the md5 package has the same problem) but in that case as far as I can see the inputs are valid UTF-8.

I think either patch_make or patch_apply generates invalid text.

But also I’m wondering why is encodeURI needed in this lib? Wouldn’t a simple escape/unescape of specific reserved characters be enough?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
michal-kurzcommented, Oct 1, 2022

I got somewhere. Take it with a grain of salt though, as I might have easily missed something 😃

The problem appears when multi-character unicode emojis get broken up into separate diffs inside a patch:

'💛'.length    // 2
'💛'.charAt(0) // \ud83d
'💛'.charAt(1) // \udc9b

encodeURI("\ud83d")  // malformed uri error
encodeURI("\udc9b")  // malformed uri error
encodeURI("\uD83D\udc9b")  // '%F0%9F%92%9B'

This mostly happens with internally generated prefixes/suffixes, for example inside patch_addContext_ - dmp “allocates” a chunk which starts/ends in the middle of an emoji. But it can occur inside an actual diff, too.

It seems to me that the problem can be solved by replacing encodeURI/decodeURI with escape/unescape (or other prefered escape method) inside patch_fromText and toString, have you tried this? It seems to work fine for my use-case - I hope it doesn’t break something else. encodeURI feels quite out of place anyway, since the code deals with abstract text, and not URIs.

Before I tried this, I also tinkered with the source code for quite a while, and I did seemingly manage to prevent such emoji splitting - in patch_addContext_, by testing if encodeURI throws, and increasing the padding and shifting the preffix end/suffix start until it didn’t. But this really is more of a desperate hack than anything else. Such approach may be able to fix emojis breaking up, but dmp would still throw as soon as an invalid character would appear for any other reason (other characters can throw too).

0reactions
laurent22commented, Oct 5, 2022

Thanks for looking into it @michal-kurz. It seems unlikely that any change will be merged to the official repository (the PR is from 2019) - do you know if there’s a good fork being maintained somewhere where that kind of fix could be applied?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Devtools yield "URI Malformed" error when unicode emoji ...
Issue details When invoking the Prosemirrors devtools via applyDevTools(view), Chrome yields multiple "URI malformed" exceptions if one ...
Read more >
URIError: malformed URI sequence - MDN Web Docs - Mozilla
The JavaScript exception "malformed URI sequence" occurs when URI encoding or decoding wasn't successful.
Read more >
jQuery "Uncaught URIError: URI malformed" Error with ...
What I think happens in my case is, when the user changes the keyboard layout to enter emoticons, the encoding changes from UTF-8...
Read more >
Fix list for IBM WebSphere Application Server Liberty
Fixes for WebSphere Application Server Liberty are delivered in fix packs periodically. This is a complete listing of all the fixes for Liberty...
Read more >
Bug listing with status UNCONFIRMED as at 2022/12/28 19 ...
Bug :128538 - "sys-apps/coreutils: /bin/hostname should be installed from coreutils not sys-apps/net-tools" status:UNCONFIRMED resolution: severity:enhancement ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found