question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Javascript line diff breaks beyond 65K lines

See original GitHub issue

I try using The google diff-match-path library from nodejs for line diffs: https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs. I get wrong patches when in sum the lines of both inputs goes beyond 65,536 (2^16) lines.

Is that a bug (in my code or diff-match-patch), or am I hitting a known limitation of javascript/nodejs? Anything I can do to use d-m-p with larger files?

This script reproduces the problem

var diff_match_patch = require("diff-match-patch")

// function copied from google wiki 
// https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs
function diff_lineMode(text1, text2) {
  var dmp = new diff_match_patch();
  var a = dmp.diff_linesToChars_(text1, text2);
  var lineText1 = a.chars1;
  var lineText2 = a.chars2;
  var lineArray = a.lineArray;
  var diffs = dmp.diff_main(lineText1, lineText2, false);
  dmp.diff_charsToLines_(diffs, lineArray);
  return diffs;
}

// reproduce problem by diffing string with many lines to "abcd"
for (let size = 65534; size < 65538; size += 1) {
  let text1 = "";
  for (let i = 0; i < size; i++) {
    text1 += i + "\n";
  }

  var patches = diff_lineMode(text1, "abcb")
  console.log("######## Size: " + size + ": patches " + patches.length)
  for (let i = 0; i < patches.length; i++) {
    // patch[0] is action, patch[1] is value
    var action = patches[i][0] < 0 ? "remove" : (patches[i][0] > 0 ? "add" : "keep")
    console.log("patch" + i + ": " + action + "\n" + patches[i][1].substring(0, 10))
  }
}

Giving these outputs (using substring in code above to shorten outputs):

######## Size: 65534: patches 2
patch0: remove
0
1
2
3
4

patch1: add
abcb
######## Size: 65535: patches 2
patch0: remove
0
1
2
3
4

patch1: add

######## Size: 65536: patches 2
patch0: keep
0

patch1: remove
1
2
3
4
5

######## Size: 65537: patches 3
patch0: remove
0

patch1: keep
1

patch2: remove
2
3
4
5
6

Using

$ node --version v6.3.1
cat package.json
{
  "name": "dmp_bug",
  "version": "1.0.0",
  "description": "reproduce issue with diff match patch",
  "main": "dmpbug.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "diff-match-patch": "^1.0.4"
  }
}

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
kevin-lindsay-1commented, Dec 14, 2020

@NeilFraser can the fromCodePoint and codePointAt solution be added as an option? I use this library via npm, and that sounds like a great option to have.

2reactions
NeilFrasercommented, Dec 4, 2018

The line/word patch mechanism presented here is to encode each unique line or word as a unique Unicode character. In ES5 only the first 16 bits are accessible (using String.fromCharCode).

However, from ES6, one can use String.fromCodePoint which should allow for 21 bits: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCodePoint

Try replacing String.fromCharCode with String.fromCodePoint and charCodeAt with codePointAt.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Diff-match-patch broken linediff beyond 65K lines
To speed up line-diffing, the algorithm does not compare the whole texts, but replaces each line with a single unicode character.
Read more >
How to use the diff.diffLines function in diff - Snyk
To help you get started, we've selected a few diff. ... check if last line is empty, if it is, remove it const...
Read more >
How many lines of code can a functional component have?
Yeah a component shouldn't go beyond 600 lines of code usually. What's the point of using modules based library to build websites if...
Read more >
A project with a single 11000-line code file - Hacker News
But you sometimes need to break the rules. Everything should be optimized for developer convenience. Convenience in deployment. Convenience in debugging.
Read more >
Changelog — Python 3.11.1 documentation
gh-95511: Fix the Shell context menu copy-with-prompts bug of copying an extra line when one selects whole lines. gh-95471: In the Edit menu,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found