`dat share` is slow with lots of files
See original GitHub issue@feross and i are creating a demo of Simple English Wikipedia hosted on dat
dat share
-ing the dump (1 big file) is fast
setup
- dat version: 13.9.2
- os version: ubuntu 16.04.3
- kernel version: 4.4.0-1047-aws
- hardware: m5.large instance, using an 800GB EBS volume, standard SSD variety
results
$ ls -lha
total 1.2G
-rw-rw-r-- 1 ubuntu ubuntu 1.2G Jan 4 2017 wikipedia_en_simple_all_2017-01.zim
[...]
we are calling dat share
on one file, total 1.2GB:
$ dat share
[... runs at about ~150MB/s, finishes in < 10 seconds ...]
dat share
-ing the articles (small files) runs very slowly
$ du -sh *
252K -
1.7G A
1.3G I
36K M
$ time ls -lha A I/m | wc -l
291382
real 0m3.740s
(notice that stat
-ing all 300k files only takes <4 seconds, so that’s not the bottleneck…)
we are calling dat share
on ~300k files, totalling 3GB:
[… runs at about ~150KB/s, hasn’t finished yet …]
tldr; dat share
throughput is 1000x less with small files
these files are about 10KB on average, and dat share
is processing just a couple of them per second
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:14 (3 by maintainers)
Top Results From Across the Web
`dat share` is slow with lots of files · Issue #915 · dat-ecosystem ...
we are calling dat share on one file, total 1.2GB: $ dat share [... runs at about ~150MB/s, finishes in < 10 seconds...
Read more >Slow file sharing access to directory with many files - TechNet
We're having a new issue with file sharing on a Windows 2008 R2 server. On the server, we have a directory that has...
Read more >Why does copying multiple files take longer time than ... - Quora
Because the storage type uses slow speed for small sized files , even if the total size of them is very big ,...
Read more >Slow performance copying large file over network (scp)
It seems a network related problem. Try to transfer a test file (say 1GB) preferably with another protocol and measure transfer speed. – ......
Read more >What can I do when the file transfer via Windows (SMB/CIFS ...
If the data transfer speed between your Synology NAS and Windows Explorer via SMB/CIFS is slow, refer to this article to check if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
adding n files takes O(n^2) time, O(n^2) space
this looks bad
after further investigation, i found out that deep in the
dat
, internals, the thing that actually records a newly added file to the hypercore log isappend-tree
._put()
:this function runs in O(n) time, appending a log entry of size O(n), if the folder contains n files
this means that adding
n
files to a folder, one after another, takes O(n^2) time and produces a hypercore feed with O(n) entries but taking up O(n^2) total space…
cc @mafintosh
log
running
dat share
with added debug logs inappend-tree
to demonstrate the issue:the folder contains ~2500 files totalling 30MB
Current timeline is to release the next dat cli with this new update by the end of the year. Until then, you can use hyperdrive-daemon which has most of the basic functionality. Cc @andrewosh