question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

split scm from dvc

See original GitHub issue

We want to split scm from dvc into a separate package on its own. Ideally, we’d like to have a proper and stable API. Though I’d prefer we split as-is.

I want to split the scm from dvc, using filter-repo to keep history intact.

Though I still see a few issues that we need to fix:

  • DVC uses logging info for UI purposes, which dvc.scm still has. So we may need to get rid of logging.info calls, and make scm work in terms of proper exceptions/returns. (Important)
  • Migrate exceptions as they subclass DvcException and are part of our UI framework.
  • scm uses Tqdm progress bars for clone operation. We need to update it to use callbacks. (Medium priority)
  • Migrating tests and fixtures used. We need to copy it or figure out a way to share fixtures.

As I want to split those from DVC with history intact, we should fix 1 and 2 in DVC itself and then need to split it out.

filter-repo how-to

I used the following snippet to split scm before.

cd "$(mktemp -d)"
git clone git@github.com:iterative/dvc.git .
git filter-repo --path dvc/scm --path dvc/tree/git.py --path dvc/fs/git.py --tag-rename '':'scmrepo-' --path-rename dvc:scmrepo

The following structure is generated:

tree .
.
β”œβ”€β”€ scmrepo
β”‚   β”œβ”€β”€ fs
β”‚   β”‚   └── git.py
β”‚   └── scm
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ base.py
β”‚       └── git
β”‚           β”œβ”€β”€ __init__.py
β”‚           β”œβ”€β”€ backend
β”‚           β”‚   β”œβ”€β”€ __init__.py
β”‚           β”‚   β”œβ”€β”€ base.py
β”‚           β”‚   β”œβ”€β”€ dulwich.py
β”‚           β”‚   β”œβ”€β”€ gitpython.py
β”‚           β”‚   └── pygit2.py
β”‚           β”œβ”€β”€ objects.py
β”‚           └── stash.py

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
efiopcommented, Jan 4, 2022

Great job, thank you, @skshetry !

2reactions
skshetrycommented, Jan 4, 2022

Calling it done, closing …

Read more comments on GitHub >

github_iconTop Results From Across the Web

Split data into blocks Β· Issue #829 Β· iterative/dvc
For a private data sharing/deduplication tool I've used a buzhash/rolling hash for chunking big files into small chunks pretty successfully.
Read more >
init | Data Version Control
--no-scm - initialize the DVC project detached from Git. It means that DVC doesn't try to find or use Git in the directory...
Read more >
How to use the dvc.exceptions.DvcException function in dvc
To help you get started, we've selected a few dvc.exceptions. ... relpath(os.path.realpath(path), self.git.working_dir) assert path.split(os.sep, 1)[0] !=
Read more >
DVC Split Stay Tips & Tricks
Maximize your park time and your DVC points with a split stay vacation between multiple Disney resorts! Learn insider tips and a few...
Read more >
Git-LFS (Large File System) and DVC – Index - Wilson Mar
Major SaaS SCM vendors offer their own LFS store service to store binary files: GitHub: https://github.blog/2015-04-08-announcing-git-large-fileΒ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found