question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dat share` is slow with lots of files

See original GitHub issue

@feross and i are creating a demo of Simple English Wikipedia hosted on dat

dat share-ing the dump (1 big file) is fast

setup

  • dat version: 13.9.2
  • os version: ubuntu 16.04.3
  • kernel version: 4.4.0-1047-aws
  • hardware: m5.large instance, using an 800GB EBS volume, standard SSD variety

results

$ ls -lha
total 1.2G
-rw-rw-r-- 1 ubuntu ubuntu 1.2G Jan  4  2017 wikipedia_en_simple_all_2017-01.zim
[...]

we are calling dat share on one file, total 1.2GB:

$ dat share
[... runs at about ~150MB/s, finishes in < 10 seconds ...]

dat share-ing the articles (small files) runs very slowly

$ du -sh *
252K	-
1.7G	A
1.3G	I
36K	M
$ time ls -lha A I/m | wc -l
291382
real	0m3.740s

(notice that stat-ing all 300k files only takes <4 seconds, so that’s not the bottleneck…)

we are calling dat share on ~300k files, totalling 3GB:

image

[… runs at about ~150KB/s, hasn’t finished yet …]

tldr; dat share throughput is 1000x less with small files

these files are about 10KB on average, and dat share is processing just a couple of them per second

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:3
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
dcposchcommented, Jan 15, 2018

adding n files takes O(n^2) time, O(n^2) space

this looks bad

after further investigation, i found out that deep in the dat, internals, the thing that actually records a newly added file to the hypercore log is append-tree._put():

this function runs in O(n) time, appending a log entry of size O(n), if the folder contains n files

this means that adding n files to a folder, one after another, takes O(n^2) time and produces a hypercore feed with O(n) entries but taking up O(n^2) total space

cc @mafintosh

log

running dat share with added debug logs in append-tree to demonstrate the issue:

the folder contains ~2500 files totalling 30MB

$ rm -rf .dat && DEBUG=append-tree node ~/code/dat/bin/cli.js share --watch=false
dat v13.9.2
Created new dat in /Users/dc/sample2/.dat
dat://b1ea4e0c8b72573bac844f5d8d50c31645a29434891aa2559a4381ae153090f8
Sharing dat: (stats disabled)
Checking for file updates...
Ctrl+C to Exit
  append-tree appending node to hypercore, 58 bytes +0ms
  append-tree appending node to hypercore, 65 bytes +10ms
  append-tree appending node to hypercore, 60 bytes +2ms
  append-tree appending node to hypercore, 71 bytes +3ms
  append-tree appending node to hypercore, 71 bytes +2ms
  append-tree appending node to hypercore, 81 bytes +3ms
  append-tree appending node to hypercore, 71 bytes +6ms
  append-tree appending node to hypercore, 72 bytes +3ms
...
  append-tree appending node to hypercore, 439 bytes +6ms
  append-tree appending node to hypercore, 425 bytes +5ms
  append-tree appending node to hypercore, 419 bytes +5ms
dat v13.9.2
Created new dat in /Users/dc/sample2/.dat
dat://b1ea4e0c8b72573bac844f5d8d50c31645a29434891aa2559a4381ae153090f8
Sharing dat: (stats disabled)



Metadata created for 357 of 1110 files (2.1 MB/s)
(Calculating file count...)
ADD: A/Albanian.html (2.0 KB)


Ctrl+C to Exit
  append-tree appending node to hypercore, 433 bytes +8ms
  append-tree appending node to hypercore, 439 bytes +4ms
  append-tree appending node to hypercore, 433 bytes +6ms
  append-tree appending node to hypercore, 432 bytes +4ms
  append-tree appending node to hypercore, 428 bytes +4ms
...
  append-tree appending node to hypercore, 1352 bytes +11ms
  append-tree appending node to hypercore, 1368 bytes +10ms
  append-tree appending node to hypercore, 1355 bytes +9ms
  append-tree appending node to hypercore, 1353 bytes +9ms
  append-tree appending node to hypercore, 1360 bytes +8ms
  append-tree appending node to hypercore, 1356 bytes +11ms
  append-tree appending node to hypercore, 1360 bytes +11ms
  append-tree appending node to hypercore, 1359 bytes +9ms
  append-tree appending node to hypercore, 1362 bytes +8ms
  append-tree appending node to hypercore, 1359 bytes +9ms
...
  append-tree appending node to hypercore, 2122 bytes +16ms
  append-tree appending node to hypercore, 2118 bytes +16ms
  append-tree appending node to hypercore, 2119 bytes +16ms
  append-tree appending node to hypercore, 2124 bytes +16ms
  append-tree appending node to hypercore, 2133 bytes +16ms
  append-tree appending node to hypercore, 2125 bytes +15ms
  append-tree appending node to hypercore, 2137 bytes +17ms
  append-tree appending node to hypercore, 2123 bytes +16ms
dat v13.9.2
Created new dat in /Users/dc/sample2/.dat
dat://b1ea4e0c8b72573bac844f5d8d50c31645a29434891aa2559a4381ae153090f8
Sharing dat: (stats disabled)



Creating metadata for 2067 files (799 KB/s)
[=========================================-] 100%
ADD: A/Alzonne.html (1.9 KB)


Ctrl+C to Exit
  append-tree appending node to hypercore, 2148 bytes +21ms
  append-tree appending node to hypercore, 2140 bytes +16ms
  append-tree appending node to hypercore, 2128 bytes +15ms
  append-tree appending node to hypercore, 57 bytes +9ms
  append-tree appending node to hypercore, 56 bytes +2ms
DCs-MacBook:sample2 dc$
3reactions
okdistributecommented, Aug 13, 2019

Current timeline is to release the next dat cli with this new update by the end of the year. Until then, you can use hyperdrive-daemon which has most of the basic functionality. Cc @andrewosh

Read more comments on GitHub >

github_iconTop Results From Across the Web

`dat share` is slow with lots of files · Issue #915 · dat-ecosystem ...
we are calling dat share on one file, total 1.2GB: $ dat share [... runs at about ~150MB/s, finishes in < 10 seconds...
Read more >
Slow file sharing access to directory with many files - TechNet
We're having a new issue with file sharing on a Windows 2008 R2 server. On the server, we have a directory that has...
Read more >
Why does copying multiple files take longer time than ... - Quora
Because the storage type uses slow speed for small sized files , even if the total size of them is very big ,...
Read more >
Slow performance copying large file over network (scp)
It seems a network related problem. Try to transfer a test file (say 1GB) preferably with another protocol and measure transfer speed. – ......
Read more >
What can I do when the file transfer via Windows (SMB/CIFS ...
If the data transfer speed between your Synology NAS and Windows Explorer via SMB/CIFS is slow, refer to this article to check if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found