Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

size of serialized DOM

See original GitHub issue

I’m seeing 10x character size of the serialization of the initial DOM state (EventType.FullSnapshot) compared with a plain HTML representation of the same thing. Is minimizing the size of this on the agenda as a design goal?

I’m thinking that it could be reduced as follows:

simple things like renaming attributes to attrs
not storing empty childNodes/attributes lists/objects (making them implicit)
removing type: 2 (type: NodeType.Element) and similar, as that can be inferred from presence of childNodes
only setting isSVG/isStyle boolean attributes if they are unusual (i.e. True)

Are there any strong reasons not to do any of the above?

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:37 (29 by maintainers)

Top GitHub Comments

6reactions

Yuyz0112commented, Jan 5, 2020

Sorry for the later. After finishing a lot of works last month, finally, I’ve got time to start working on rrweb again!

I think this issue is the most important one in the current stage, and I would like to provide a solution int the next major release.

With the ideas that I illustrated above, I have done some POC code in this repo.

Currently, I have implemented a analyze framework and several packers:

simple packer. Following @eoghanmurray’s comments, this packer makes the keys shorter and omits some keys which can be inferred by the data structure.
msgpack packer. Use msgpack-javascript to encode and decode events.
pako packer. Use pako to deflate and inflate events.

Now the msgpack packer is not working as intend and I’m still checking my implementation. The other two shows some good result when testing on two real-world events log.

I’m using two real-world events log to benchmark the packers:

e1: An events log with a big full snapshot.
e2: An events log with many incremental snapshots created by a table-like UI, which means the DOMs are similar to others.

===

simple

e1

"packedSize": 1870789,
"size":       2115468,

e2

"packedSize": 6023940,
"size":       10457884,

pako

e1

"packedSize": 1093306,
"size":       2115468,

e2

"packedSize": 1435585,
"size":       10457884,

2reactions

eoghanmurraycommented, Feb 28, 2020

the trade-off is end-users will not load the pack plugin bundle, but will still have a relative high transfer data size and your server will become a centralize packing factory.

Just a reminder that my original proposal related to being a bit more careful/efficient in the JSON format itself. Reducing the repetitive aspects of the original JSON would provide advantages in transmission as well preempt much of the need for zipping either client side or server side.

But keep the data structure explicit is also very important. Then why are numeric codes used instead of strings e.g. 8 instead of 'TouchMove_Departed'?? (IMO these would actually be easier to work with if they were fully expanded)

Here’s a quick analysis of a sample JSON DOM structure showing repetitive keys:

{ type: 560 childNodes: 218 name: 1 publicId: 1 systemId: 1 id: 560 tagName: 217 attributes: 217 textContent: 341 isStyle: 1 }

And here’s the empty nodes e.g. { ... attributes: {}, ... }: {attributes: 79, childNodes: 47}

(Here’s the code I executed at the console to come up with these figures:

var counts = {};
var empty_counts = {};
var count_nodes = function(n) {
    for (var k in n) {
	if (counts[k] === undefined) {
	    counts[k] = 1;
	} else {
	    counts[k] += 1;	
	}
	if (typeof(n[k]) == 'object' && keys(n[k]).length == 0){
	    if (empty_counts[k] === undefined) {
		empty_counts[k] = 1;
	    } else {
	     	empty_counts[k] += 1;	
	    }	     
	}
	if (Array.isArray(n[k])) {
	    for (var i=0; i<n[k].length; i++) {
		count_nodes(n[k][i]);
	    }
	}
    }
}
count_nodes(e.data.node);

)

So by e.g. abbreviating attributes -> a, textContent -> t, tagName -> n, childNodes -> c you’d effectively be doing a lot of what I imagine gzip is doing ‘for free’, and I don’t think it will be any less legible to someone browsing the structure as you’d usually be able to infer the meaning from the context (the value).

This could be done in a backwards compatible way so that it’s still possible to playback non-abbreviated content.