`dvc push` / `import-url --to-remote` to remote ssh works very, very slow.
See original GitHub issueBug Report
I think it did through enough research to report this bug. I’ve first try to find info online and in support threads - no luck there. I’ve talked on Discord with Gao and he decided I should post this as a bug here.
Description
dvc push
and dvc add / import-url --to-remote
works really really slow. Like, few kilobytes in 5 minutes slow. SCP and SFTP to that same server work fast.
Reproduce
touch test.txt
echo "AAAA" > test.txt
dvc add test.txt
dvc push
or
dvc import-url --to-remote https://example.com/some.jpg
The transferring process takes very very long. However, once in a (undefined) while it works fast.
Expected
Doing that same operation while remote is on local drive works blazingly fast.
Environment information
Topology of the network: MacOS — VPN — (over ssh) — ML server (local folder, shared directory)
However, this process was reproduced in various environments:
- without VPN
- with direct online access (no NAT) to the ML server
- with different eth interface on the ML machine
- on a different machine - PC - with different OS (Windows 10)
- on another machine - Linux and different remote ssh location
- from local machine to Oracle cloud
And it always was very very slow.
Output of config files:
.dvc/config remote.remote-ssh.url=ssh://ip/path/to/remote
.dvc/config core.remote=remote-ssh
.dvc/config.local remote.remote-ssh.user=login
.dvc/config.local remote.remote-ssh.password=password
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.30.0 (pip)
---------------------------------
Platform: Python 3.9.6 on macOS-12.6-arm64-arm-64bit
Subprojects:
dvc_data = 0.17.1
dvc_objects = 0.7.0
dvc_render = 0.0.12
dvc_task = 0.1.3
dvclive = 0.11.0
scmrepo = 0.1.1
Supports:
http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
ssh (sshfs = 2022.6.0)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: ssh
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Additional Information (if any):
Here are the c-profile dumps from dvc push
slow dvc push
fast dvc push
If you guys want, I can send the files, directly to the dev.
As you can see, there’s clearly a performance issue. We can exclude the lock of file by some cloud storage like OneDrive, Dropbox etc. IDEs like Pycharm or backup apps like Time-Machine. The bug was reproduced outside of such folders.
We can exclude the threading problem - dvc push -j 1
works slow too.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
Ok, great news! this issue fixed the problem. Now, only the initial push via dvc[ssh] (the making actually making the repo folders) is slow. Everything else works fast.
Many thanks to @efiop, @pared and all DVC team!
Ok, so now it goes flawlessly. I think. I need more tests, but it generally works. And I’m pretty sure the problem is with lock on the files caused discrepancy between cache and remote. If I deleted the folder on the remote and pushed the same files again, I reproduced the problem. Could this be the only cause?! What is the philosophy behind cache and remote? So if I push the same files, to two different remotes just by changing the config file, am I being a bad boy? This is odd since I’m pretty sure the problem with ssh was caused even with a clear dvc install. But after all this messing around, I might be wrong… I’ll check that too. Anyway, I’m sending the correct and wrong callgrind files.
callgrind-correct.zip callgrind-wrong.zip