question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Better support SCM functionality

See original GitHub issue

From #318 the basic user cases are as follows:

Proposal

Notion.

SCM view

  1. View all directories and files that have changed when compared to HEAD or cache (Visibility / Situational awareness)
  2. Checkout / commit and files or directories that have changed (Actions) - exact UX TBD.
  3. Checkout / commit / push or pull the entire repository (Actions)

Statuses that we currently provide in the extension

Status SCM View Decorations Provided ** Sourced from Notes
added Y Y diff + list
deleted Y Y diff + list
modified Y Y diff + list + status
notInCache Y Y diff + list
renamed Y Y diff + list
stageModified Y Y diff + list + status For a detailed explanation of modified vs stageModified see https://github.com/iterative/vscode-dvc/issues/318#issuecomment-845800460
untracked Y Y git this is untracked with respect to both git and dvc. We show these files because the user may want to dvc add them.
tracked Y Y list we decorate tracked because they are generally “git ignored” which will give them a “greyed out” decoration

** Where possible we match the git extension’s decorations because we are trying to make the extension feel as native as possible. Our SCM integration is designed to show the user the state of the workspace with respect to the most recent commit.

Current approach (parallel CLI Commands)

name command reason
list dvc list . --dvc-only -R --show-json provides a list of all tracked files that we use for both decoration and SCM purposes. In the SCM view all files that we show must be tracked by DVC. We do this because we end up with untracked but modified (duplicates) items in the tree from diff if we do not
diff dvc diff --show-json we map the output of diff directly to the list output to set all added, deleted, renamed, notInCache. We use it in combination with the status output to determine the difference between modified and “stage modified”
status dvc status --show-json only used to determine the difference between modified and “stage modified”

We currently try to run all three of the above commands in parallel. If any of the commands fail then we will retry all three until they have all completed without error. We do this to best mitigate stale information ending up in the extension.

Issues with the current approach

  1. It’s still slow (#608)
  2. We are unsure of the what the actual UI / UX should be (#609)
  3. A lot of data is sent between the CLI and extension that is unused (example: in get-started-experiments after first running an experiment the output of diff contains ~80k “added” files, none of these files are tracked by dvc so we filter all of the records out)
  4. We have issues running multiple commands in parallel (https://github.com/iterative/vscode-dvc/issues/767#issuecomment-910862443) <- this is particularly important because it means we cannot currently run the extension against get-started-experiments

Options for mitigation

# option pros cons
1 Run commands sequentially locks should no longer be an issue even slower
2 Only rerun failed commands also mitigates lock issue involves more complicated logic, possibility of stale data
3 Make all 3 commands lockless allows us to continue to run all commands in parallel involves work from the CLI team and is only an interim solution, complicates internal of DVC
4 Combine commands into single command that the integration can run limits the amount of data needing to be transferred between the cli and extension, should be faster, cuts out grouped retry logic more effort required, unsure as to benefit to general users
5 Replace CLI calls with event driven architecture eliminates the need to call the CLI, could serve multiple clients requires even more work and is not a short or even medium term solution
6 Make commands “lightweight” (add --dvc-only) would limit the amount of data being passed and could speed things up unsure as to the benefit to general users, still requires effort, could still run into lock issues

My preference would be to start work on 4 as it would actually help us move towards 5.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
mattseddoncommented, Sep 13, 2021

Having discussed with @efiop and @dberenbaum in the 21/09/14 cross team meeting we came to the following conclusion for next steps:

  1. We need a single command.
  2. The best candidate is status (to match up with git).
  3. It should have an option to display paths only tracked by DVC (--dvc-only).
  4. It should have an option (or separate command) to show stage information (dvc stage status).
  5. It will have all the same available statuses as diff with a further status used to separate files between modified against cache and modified against HEAD.

Points to discuss:

  1. Should the current dvc status be migrated to a separate command (e.g dvc stage status) or should there be a flag added to condense the information to what we need?
  2. What will the new status be named?
  3. How will the new status be displayed in the output (to distinguish from modified)? The way that git handles this is shown above (https://github.com/iterative/vscode-dvc/issues/772#issuecomment-917760251)

I will start a document now in notion here. We can continue the discussion there.

1reaction
dberenbaumcommented, Sep 14, 2021

Thanks, @mattseddon! Looks great.

We actually have a proposal template in Notion. I didn’t want to ask you to do that much work, but you basically filled the template already in the doc you created, so I transferred your text into a proposal in https://www.notion.so/iterative/Consolidate-repo-status-ed3cd60f706f4fcaba1d3f3cac1498e9.

I’d like to flesh it out with some DVC requirements since this work should also be about improving the user experience within DVC. Take a look and let me know if/when it’s okay to start adding on to the proposal.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The 5 Functions of Supply Chain Management - PlanetTogether
Good supply chain management ensures that you maintain a balance between demand and supply. In order to reduce waste, increase profits, ...
Read more >
8 essential features of an effective supply chain management ...
We've identified eight features essential to supply chain management software—ones that can help organizations create a solid digital supply ...
Read more >
8 Key Benefits of Effective Supply Chain Management
From improving accuracy to keeping up with demand, 6 River Systems shares why supply chain management is important.
Read more >
Features Of Supply Chain Management | SCM Requirements
SCM feature helps users more capably direct what happens inside the warehouse. These systems help deal with all parts of the equation, including...
Read more >
The Best Supply Chain Management Software & Tools
These features help you connect your data with those of your partners: suppliers, factories, warehouses, retailers, and transportation ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found