question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dvc pull`: Config field `jobs` in `.dvc/config` not taken into account as expected

See original GitHub issue

Bug Report

dvc pull: Config field jobs in .dvc/config not taken into account as expected

Description

dvc pull is still using cpu_count() * 4 download jobs even if jobs=4 is defined under the remote in use in .dvc/config.

Furthermore, the jobs=4 still seems to have an effect, as the dvc pull otherwise often fails with:

image

With the jobs=4 config that problem seems to be history so it seems to have an effect somehow.

Maybe this is not really clear from the documentation!?

Reproduce

  1. dvc init
  2. Edit .dvc/config file and add a remote - for instance:
[core]
    remote = artifactory
['remote "artifactory"']
    url = https://my.artifactory.com/...
    auth = basic
    method = PUT
    jobs = 4
  1. Run dvc pull
  2. See that more than 4 download jobs are running in parallel

Expected

It was expected that the jobs=4 config in .dvc/config file does the same as running dvc pull --jobs 4 which is not the case.

Environment information

DVC 2.10.2 on Windows.

Output of dvc doctor:

$ dvc doctor

DVC version: 2.10.2 (exe)
---------------------------------
Platform: Python 3.8.10 on Windows-10-10.0.19042-SP0
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        gdrive (pydrive2 = 1.10.1),
        gs (gcsfs = 2022.3.0),
        hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21),
        ssh (sshfs = 2022.3.1),
        oss (ossfs = 2021.8.0),
        webdav (webdav4 = 0.9.7),
        webdavs (webdav4 = 0.9.7)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: NTFS on C:\
Repo: dvc, git

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
dberenbaumcommented, Aug 30, 2022

Ping @daavoo

1reaction
stefan-hartmann-lgscommented, Aug 24, 2022

@daavoo any infos on this one?

Read more comments on GitHub >

github_iconTop Results From Across the Web

config | Data Version Control - DVC
Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
Read more >
Bug Report - GitHub
I therefore assume that the jobs field is ignored by the command. Solution? In dvc/fs/base.py I found. def jobs(self): return ( self.
Read more >
Data & Model Management with DVC | Analytics Vidhya
In this post we learn about versioning for ML projects & use DVC to version & maintain ML artifacts in a remote Amazon...
Read more >
Data Version Control With Python and DVC - Real Python
dvc file is lightweight and meant to be stored with your code in GitHub. When you download a Git repository, you also get...
Read more >
Configure a DVC remote without a DevOps degree - DagsHub
To push or pull data from this URL, we will use our existing DAGsHub credentials (via HTTPS basic authentication). Meaning we don't need...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found