question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

size of serialized DOM

See original GitHub issue

I’m seeing 10x character size of the serialization of the initial DOM state (EventType.FullSnapshot) compared with a plain HTML representation of the same thing. Is minimizing the size of this on the agenda as a design goal?

I’m thinking that it could be reduced as follows:

  • simple things like renaming attributes to attrs
  • not storing empty childNodes/attributes lists/objects (making them implicit)
  • removing type: 2 (type: NodeType.Element) and similar, as that can be inferred from presence of childNodes
  • only setting isSVG/isStyle boolean attributes if they are unusual (i.e. True)

Are there any strong reasons not to do any of the above?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:37 (29 by maintainers)

github_iconTop GitHub Comments

6reactions
Yuyz0112commented, Jan 5, 2020

Sorry for the later. After finishing a lot of works last month, finally, I’ve got time to start working on rrweb again!

I think this issue is the most important one in the current stage, and I would like to provide a solution int the next major release.

With the ideas that I illustrated above, I have done some POC code in this repo.

Currently, I have implemented a analyze framework and several packers:

  1. simple packer. Following @eoghanmurray’s comments, this packer makes the keys shorter and omits some keys which can be inferred by the data structure.
  2. msgpack packer. Use msgpack-javascript to encode and decode events.
  3. pako packer. Use pako to deflate and inflate events.

Now the msgpack packer is not working as intend and I’m still checking my implementation. The other two shows some good result when testing on two real-world events log.

I’m using two real-world events log to benchmark the packers:

  • e1: An events log with a big full snapshot.
  • e2: An events log with many incremental snapshots created by a table-like UI, which means the DOMs are similar to others.

===

simple

e1

"packedSize": 1870789,
"size":       2115468,

e2

"packedSize": 6023940,
"size":       10457884,

pako

e1

"packedSize": 1093306,
"size":       2115468,

e2

"packedSize": 1435585,
"size":       10457884,
2reactions
eoghanmurraycommented, Feb 28, 2020

the trade-off is end-users will not load the pack plugin bundle, but will still have a relative high transfer data size and your server will become a centralize packing factory.

Just a reminder that my original proposal related to being a bit more careful/efficient in the JSON format itself. Reducing the repetitive aspects of the original JSON would provide advantages in transmission as well preempt much of the need for zipping either client side or server side.

But keep the data structure explicit is also very important. Then why are numeric codes used instead of strings e.g. 8 instead of 'TouchMove_Departed'?? (IMO these would actually be easier to work with if they were fully expanded)

Here’s a quick analysis of a sample JSON DOM structure showing repetitive keys:

{ type: 560 childNodes: 218 name: 1 publicId: 1 systemId: 1 id: 560 tagName: 217 attributes: 217 textContent: 341 isStyle: 1 }

And here’s the empty nodes e.g. { ... attributes: {}, ... }: {attributes: 79, childNodes: 47}

(Here’s the code I executed at the console to come up with these figures:

var counts = {};
var empty_counts = {};
var count_nodes = function(n) {
    for (var k in n) {
	if (counts[k] === undefined) {
	    counts[k] = 1;
	} else {
	    counts[k] += 1;	
	}
	if (typeof(n[k]) == 'object' && keys(n[k]).length == 0){
	    if (empty_counts[k] === undefined) {
		empty_counts[k] = 1;
	    } else {
	     	empty_counts[k] += 1;	
	    }	     
	}
	if (Array.isArray(n[k])) {
	    for (var i=0; i<n[k].length; i++) {
		count_nodes(n[k][i]);
	    }
	}
    }
}
count_nodes(e.data.node);

)

So by e.g. abbreviating attributes -> a, textContent -> t, tagName -> n, childNodes -> c you’d effectively be doing a lot of what I imagine gzip is doing ‘for free’, and I don’t think it will be any less legible to someone browsing the structure as you’d usually be able to infer the meaning from the context (the value).

This could be done in a backwards compatible way so that it’s still possible to playback non-abbreviated content.

Read more comments on GitHub >

github_iconTop Results From Across the Web

JavaScript: how to serialize a DOM element as a string to be ...
I would like to clone this element (and all CSS and JS being applied), serialize it as a string that I could save...
Read more >
DOM Parsing and Serialization - W3C
This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and ...
Read more >
dom-serialize - npm
dom -serialize. Serializes any DOM node into a String. Sauce Test Status. Build Status. It's like outerHTML , but it works with:.
Read more >
XML/DOM serialization - Rosetta Code
Create a simple DOM and having it serialize to:
Read more >
DOM Standard
Return the number of node 's children. A node is considered empty if its length is 0. 4.2.1. Document tree. A ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found