question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`data:status`: errors when data not in cache

See original GitHub issue

Bug Report

Description

On initially cloning a repository all tracked paths are shown in three categories:

  1. Not in cache
  2. Committed modified
  3. Uncommitted deleted

There can also be missing data, for example in the vscode-dvc demo project: training_metrics is missing. This is produced by DVCLive and is listed under the plots key in the dvc.yaml.

This breaks one of the workflows in the VS Code extension.

Reproduce

  1. git clone https://github.com/iterative/vscode-dvc
  2. cd vscode-dvc/demo
  3. python3 -m virtualenv .env
  4. source .env/bin/activate
  5. pip install -r requirements.txt
  6. dvc data status --show-json --with-dirs --granular --untracked --unchanged
{
  "not_in_cache": [
    "model.pt",
    "misclassified.jpg",
    "predictions.json"
  ],
  "committed": {
    "modified": [
      "model.pt",
      "misclassified.jpg",
      "predictions.json"
    ]
  },
  "uncommitted": {
    "deleted": [
      "model.pt",
      "misclassified.jpg",
      "predictions.json"
    ]
  }
}

Note: DVC may need to be updated to 2.15.0 in the requirements.txt file.

Expected

Paths are returned in not in cache key only. All paths are returned.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.15.0 (pip)
---------------------------------
Platform: Python 3.10.5 on macOS-12.2.1-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Additional Information (if any):

Also verified on dvc-2.15.1.dev13+g9ff18502.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
skshetrycommented, Aug 8, 2022

For committed changes where there’s no cache, we probably can just look at the hashes and tell they are unchanged, and only report not in cache, so we can avoid modified here.

I’ll try to look into more scenarios where we can avoid the modified/deleted stuff. I have been using this defintion of not in cache for now:

not in cache: An output exists in the workspace, and the corresponding file hash in the dvc.lock or .dvc file is up to date, but there is no corresponding cache file or directory.

1reaction
skshetrycommented, Aug 23, 2022

@mattseddon, I am working on it, hopefully by the end of this week. 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

`data status`: throws unexpected error if any dvc.yaml in the ... - GitHub
I am fine with introducing the concept of partial results in dvc data status --json and exit with 2 , but I am...
Read more >
Troubleshooting | Data Version Control - DVC
Failed to pull data from the cloud · Too many open files error · Unable to find credentials · Unable to connect ·...
Read more >
Placeholder and Initial Data in React Query | TkDodo's blog
InitialData Since initialData is persisted in the cache, the refetch error is treated like any other background error. Our query will be in ......
Read more >
AngularJS: How can I cache json data returned from $http call?
If you data is simple enough, my suggestion is to write your own cache that is checked before you use the angular $http...
Read more >
Reading and writing data to the cache - Apollo GraphQL Docs
You can read and write data directly to the Apollo Client cache, without communicating with your GraphQL server. You can interact with data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found