Javascript line diff breaks beyond 65K lines
See original GitHub issueI try using The google diff-match-path library from nodejs for line diffs: https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs. I get wrong patches when in sum the lines of both inputs goes beyond 65,536 (2^16) lines.
Is that a bug (in my code or diff-match-patch), or am I hitting a known limitation of javascript/nodejs? Anything I can do to use d-m-p with larger files?
This script reproduces the problem
var diff_match_patch = require("diff-match-patch")
// function copied from google wiki
// https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs
function diff_lineMode(text1, text2) {
var dmp = new diff_match_patch();
var a = dmp.diff_linesToChars_(text1, text2);
var lineText1 = a.chars1;
var lineText2 = a.chars2;
var lineArray = a.lineArray;
var diffs = dmp.diff_main(lineText1, lineText2, false);
dmp.diff_charsToLines_(diffs, lineArray);
return diffs;
}
// reproduce problem by diffing string with many lines to "abcd"
for (let size = 65534; size < 65538; size += 1) {
let text1 = "";
for (let i = 0; i < size; i++) {
text1 += i + "\n";
}
var patches = diff_lineMode(text1, "abcb")
console.log("######## Size: " + size + ": patches " + patches.length)
for (let i = 0; i < patches.length; i++) {
// patch[0] is action, patch[1] is value
var action = patches[i][0] < 0 ? "remove" : (patches[i][0] > 0 ? "add" : "keep")
console.log("patch" + i + ": " + action + "\n" + patches[i][1].substring(0, 10))
}
}
Giving these outputs (using substring in code above to shorten outputs):
######## Size: 65534: patches 2
patch0: remove
0
1
2
3
4
patch1: add
abcb
######## Size: 65535: patches 2
patch0: remove
0
1
2
3
4
patch1: add
######## Size: 65536: patches 2
patch0: keep
0
patch1: remove
1
2
3
4
5
######## Size: 65537: patches 3
patch0: remove
0
patch1: keep
1
patch2: remove
2
3
4
5
6
Using
$ node --version v6.3.1
cat package.json
{
"name": "dmp_bug",
"version": "1.0.0",
"description": "reproduce issue with diff match patch",
"main": "dmpbug.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"diff-match-patch": "^1.0.4"
}
}
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Why Diff-match-patch broken linediff beyond 65K lines
To speed up line-diffing, the algorithm does not compare the whole texts, but replaces each line with a single unicode character.
Read more >How to use the diff.diffLines function in diff - Snyk
To help you get started, we've selected a few diff. ... check if last line is empty, if it is, remove it const...
Read more >How many lines of code can a functional component have?
Yeah a component shouldn't go beyond 600 lines of code usually. What's the point of using modules based library to build websites if...
Read more >A project with a single 11000-line code file - Hacker News
But you sometimes need to break the rules. Everything should be optimized for developer convenience. Convenience in deployment. Convenience in debugging.
Read more >Changelog — Python 3.11.1 documentation
gh-95511: Fix the Shell context menu copy-with-prompts bug of copying an extra line when one selects whole lines. gh-95471: In the Edit menu,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@NeilFraser can the
fromCodePoint
andcodePointAt
solution be added as an option? I use this library via npm, and that sounds like a great option to have.The line/word patch mechanism presented here is to encode each unique line or word as a unique Unicode character. In ES5 only the first 16 bits are accessible (using String.fromCharCode).
However, from ES6, one can use String.fromCodePoint which should allow for 21 bits: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/fromCodePoint
Try replacing
String.fromCharCode
withString.fromCodePoint
andcharCodeAt
withcodePointAt
.