question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

doctor: should dvc doctor show remote-required dependency versions

See original GitHub issue

Something that we’ve been experiencing lately is asking for an extra question regarding the version of their used remote dependency. We could try to annotate the dvc doctor’s output with the stuff that we can get (optional). Something like this

 $ dvc doctor
DVC version: 2.4.2+f1bf10.mod 
---------------------------------
Platform: Python 3.8.5+ on Linux-5.4.0-77-generic-x86_64-with-glibc2.29
...
Remotes: gs (gcsfs == 2021.06.01, ... == 1.0.2)
...

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:4
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
isidenticalcommented, Jul 1, 2021

One thing that is a bit problematic is when the all remotes are installed. Normally instead of listing everything, DVC only prints All remotes;

 $ dvc doctor
DVC version: 2.4.3+ce45ec.mod 
---------------------------------
Platform: Python 3.8.5+ on Linux-5.4.0-77-generic-x86_64-with-glibc2.29
Supports: All remotes
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/sda7
Repo: dvc, git

Though this is probably going to be problematic for us in the future, when we introduce the plugin system the definition of ‘All remotes’ will probably change. There are 3 changes that we can do;

  • Keep it as is, and only annotate DVC installations with limited remotes (for dvc[all], nothing is going to change)
  • Make it add version information only when you run dvc doctor -v or dvc doctor -vv (??, seems like a bad choice)
  • Show it by default (too complicated?)
DVC version: 2.4.3+ce45ec.mod 
---------------------------------
Platform: Python 3.8.5+ on Linux-5.4.0-77-generic-x86_64-with-glibc2.29
Supports: azure(adlfs = 0.7.7, knack = 0.7.2), gdrive(pydrive2 = 1.8.1), gs(gcsfs = 2021.6.1+0.g17e7a09.dirty), hdfs(pyarrow = 2.0.0), webhdfs(hdfs = 2.5.8), http, https, s3(s3fs = 2021.6.1+5.gfc4489c, boto3 = 1.17.49), ssh(paramiko = 2.7.2), oss(oss2 = 2.6.1), webdav(webdav4 = 0.8.1), webdavs(webdav4 = 0.8.1)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/sda7
Repo: dvc, git

or perhaps something like this (much better IMHO);

 $ dvc doctor
DVC version: 2.4.3+ce45ec.mod 
---------------------------------
Platform: Python 3.8.5+ on Linux-5.4.0-77-generic-x86_64-with-glibc2.29
Supports:
        azure(adlfs = 0.7.7, knack = 0.7.2),
        gdrive(pydrive2 = 1.8.1),
        gs(gcsfs = 2021.6.1+0.g17e7a09.dirty),
        hdfs(pyarrow = 2.0.0),
        webhdfs(hdfs = 2.5.8),
        http,
        https,
        s3(s3fs = 2021.6.1+5.gfc4489c, boto3 = 1.17.49),
        ssh(paramiko = 2.7.2),
        oss(oss2 = 2.6.1),
        webdav(webdav4 = 0.8.1),
        webdavs(webdav4 = 0.8.1)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/sda7
Repo: dvc, git
0reactions
isidenticalcommented, Jul 1, 2021

I’m fine with removing all remotes. Though I find dvc version to be already complicated and slow, and we are in a transition period for the remote updates, maybe we won’t need it in the future?

I don’t think that it is going to affect performance that much, an importlib.metadata.version() call is +/- 1 msec, so at total it will be at most 50-60 msec change (including the import of importlib.metadata) to retrieve all the versions we need which is waaay lover compared to importing each module so I don’t think it is even noticable.

we are in a transition period for the remote updates, maybe we won’t need it in the future?

I am not sure about what is the remote update stuff, but I doubt they will completely eliminate the need for checking remote dependency versions. At least not until we stabilize everything on the fsspec side.

Anyway, to simplify the output a bit, maybe we can even get rid of remote names and just print libraries, like how pytest does.

Though, it will probably overflow and it is not easy to locate a single dependency bound to a single remote (e.g pyarrow for example).

Similarly, http remotes are always bundled, so we don’t really need them.

We could potentially create a list of always installed remotes, or just skip the ones that have no REQUIRES (e.g http/https in this case)

Also naming the section as plugins, as that is how we are going to consider them in the future.

Makes sense, or supported remote types:/supported file systems: (a bit too internal terminology?) but yea I think they are all better than regular bare Supports:.

Read more comments on GitHub >

github_iconTop Results From Across the Web

remote | Data Version Control - DVC
A set of commands to set up and manage data remotes: add, default, list, ... as remote storage, you might need to install...
Read more >
metrics diff --show-md master: 'NoneType' object is ... - GitHub
Bug Report Within master branch: dvc metrics diff --show-md master ERROR: unexpected error - 'NoneType' object is not subscriptable Having any troubles?
Read more >
How to Version Gigabyte-Sized Datasets Just Like Code with ...
This section will show the basics of how Git and DVC work together. To add DVC tracking to your project, we need to...
Read more >
Learning DVC by trial and error - A Peck of Pickled POJOs
“Yes, DVC is technically even not a version control system! .dvc files content defines data file versions. Git itself serves as the version ......
Read more >
Using DVC to create an efficient version control system for ...
We also concede that versioning these files using git wasn't efficient ... At some point, we need to re-compute evaluation metrics for ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found