question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Would it help to if internal arrays were sparse?

See original GitHub issue

https://github.com/lemire/FastBitSet.js/blob/master/FastBitSet.js#L120 initialises the enlarged array slots to 0. I wonder if it would be better to have a hole there instead?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
lemirecommented, May 21, 2020

@dimaqq

re: typed arrays: I have a suspicion that v8 optimises typed arrays that are mostly zeros.

I think that an typed array will use far less memory, at least in some instances, but it is not due to optimizations related to having ‘mostly zeroes’.

Given that they were designed to be passed to underlying C/C++ functions, it would violate my expectations that they relied on anything by a flat array. Of course, if you never touch the memory, at all, you may not hit RAM at all, the pages may never be mapped. That is, there is a big difference between “untouched” and “mostly zeros”.

We can test it out…

var x = new Uint32Array(1000000000);

This works, and I find that the memory usage is almost nil.

for(let i = 0; i < 1000000000; i++) {
  x[i] = 0;
}

And voilà… node reports 4GB of memory usage.

Let us see how much memory an array takes…

var x = new Array();
for(let i = 0; i < 100000000; i++) {
 x[i] = 0;
}

I find that about 16 bytes per entry are used.

Do you confirm?

re: roaring, I’m trying it now, the dependency tree is a bit scary but I see it’s not all runtime, so let’s see 😃

The dependencies in JavaScript are a bit insane but I don’t think that the roaring package started it.

1reaction
lemirecommented, May 21, 2020

Somehow the js engine saved me from myself in other bitset strategies

The TypedFastBitSet is backed by new Uint32Array(size) so there is no “hole”. It may use less memory than FastBitSet but not because of the holes.

js engine is free to implement that as either a flat array with sentinels, or as associative map and free to switch the underlying representation back and forth as holes are added and removes, as array is grown or shrunk.

The good thing with a BitSet, backed by an array, is that has predictably good performance… and also have a predictable memory usage.

To illustrate, let me run the current benchmark… First, with the code in master…

$ node test.js

starting inplace union  benchmark
FastBitSet (inplace) x 140 ops/sec ±0.81% (82 runs sampled)

starting inplace intersection  benchmark
FastBitSet (inplace) x 14,059 ops/sec ±0.64% (90 runs sampled)

Next, let us remove line 120 and rerun the benchmark…

$ node test.js
starting inplace union  benchmark
FastBitSet (inplace) x 38.62 ops/sec ±2.70% (70 runs sampled)

starting inplace intersection  benchmark
FastBitSet (inplace) x 3,183 ops/sec ±0.26% (100 runs sampled)

So you go from 14,000 operations per second to 3,000 operations per second. It is several times slower. Not good.

fastbitset OOM’ed on node

Have you considered roaring? It should use less memory than just about anything else, and it is going to be really, really fast… Let me try again with roaring

$ node test.js
starting inplace union  benchmark
roaring x 753 ops/sec ±1.02% (95 runs sampled)

starting inplace intersection  benchmark
roaring x 71,793 ops/sec ±0.11% (99 runs sampled)

See how we are several times faster than FastBitSet?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why are we allowed to create sparse arrays in JavaScript?
Arrays that are sufficiently sparse are typically implemented in a slower, more memory-efficient way than dense arrays are`. "more memory- ...
Read more >
efficiency of sparse array creation - Google Groups
Creating sparse arrays seems exceptionally slow. I can set up the non-zero data of the array relatively quickly. For example, the following code...
Read more >
A Gentle Introduction to Sparse Matrices for Machine Learning
The sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes.
Read more >
Working with Sparse Arrays - Wolfram Language Documentation
Sparse representations of matrices are useful because they do not store every element. If one particular value appears very frequently it can be...
Read more >
How does MATLAB internally store sparse arrays / Calling ...
MATLAB stores sparse matrices in the CSC 0-based format, except MATLAB does not store the pointerB and pointerE info the same way. MATLAB...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found