`dvc pull`: Config field `jobs` in `.dvc/config` not taken into account as expected
See original GitHub issueBug Report
dvc pull
: Config field jobs
in .dvc/config
not taken into account as expected
Description
dvc pull
is still using cpu_count() * 4
download jobs even if jobs=4
is defined under the remote in use in .dvc/config
.
Furthermore, the jobs=4
still seems to have an effect, as the dvc pull
otherwise often fails with:
With the jobs=4
config that problem seems to be history so it seems to have an effect somehow.
Maybe this is not really clear from the documentation!?
Reproduce
dvc init
- Edit
.dvc/config
file and add a remote - for instance:
[core]
remote = artifactory
['remote "artifactory"']
url = https://my.artifactory.com/...
auth = basic
method = PUT
jobs = 4
- Run
dvc pull
- See that more than 4 download jobs are running in parallel
Expected
It was expected that the jobs=4
config in .dvc/config
file does the same as running dvc pull --jobs 4
which is not the case.
Environment information
DVC 2.10.2 on Windows.
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.10.2 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
gdrive (pydrive2 = 1.10.1),
gs (gcsfs = 2022.3.0),
hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
ssh (sshfs = 2022.3.1),
oss (ossfs = 2021.8.0),
webdav (webdav4 = 0.9.7),
webdavs (webdav4 = 0.9.7)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git
Issue Analytics
- State:
- Created a year ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
config | Data Version Control - DVC
Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
Read more >Bug Report - GitHub
I therefore assume that the jobs field is ignored by the command. Solution? In dvc/fs/base.py I found. def jobs(self): return ( self.
Read more >Data & Model Management with DVC | Analytics Vidhya
In this post we learn about versioning for ML projects & use DVC to version & maintain ML artifacts in a remote Amazon...
Read more >Data Version Control With Python and DVC - Real Python
dvc file is lightweight and meant to be stored with your code in GitHub. When you download a Git repository, you also get...
Read more >Configure a DVC remote without a DevOps degree - DagsHub
To push or pull data from this URL, we will use our existing DAGsHub credentials (via HTTPS basic authentication). Meaning we don't need...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ping @daavoo
@daavoo any infos on this one?