question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

License confirmation for `dvc get` and `dvc import`

See original GitHub issue

There are various licenses for downloadable datasets. dvc get and dvc import can check if there is a LICENSE file within a tracked directory and print this license, and ask the user for confirmation before download. This allows us to conform with attribution and copyright requirements in licenses like MIT or Apache.

For a Git repository directory in the form

.
├── README.md
├── fashion-mnist
│   ├── LICENSE
│   ├── raw
│   │   ├── t10k-images-idx3-ubyte.gz
│   │   ├── t10k-labels-idx1-ubyte.gz
│   │   ├── train-images-idx3-ubyte.gz
│   │   └── train-labels-idx1-ubyte.gz
│   └── raw.dvc

we use dvc get https://github.com/iterative/dataset-registry/fashion-mnist/raw.dvc to get the dataset.

At this point, instead of directly downloading, DVC can check whether there is a LICENSE file in the directory fashion-mnist/ and present it to the user for confirmation. The same is applicable to dvc import.

I think this should be the default behavior and an option like --skip-license-confirmation is also needed for scripts.

This provides a basis to provide all public datasets with different license restrictions in a single dataset registry.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
dberenbaumcommented, May 11, 2021

@iesahin I think @pmrowla is suggesting that this could be built on top of DVC but be considered a separate product.

It seems like this issue is more of a feature request for a public dataset registry, with license confirmation being one of the requirements of that feature request. Would you agree @iesahin? Am I missing anything?

1reaction
pmrowlacommented, May 10, 2021

DVC doesn’t host or distribute anything though, it’s just tooling. I guess the line is blurred a bit when it comes to Studio, but it still seems to me like anything on the licensing/attribution side of things would be a Studio issue, and not a core DVC issue (similar to the difference between github/gitlab and git).

Read more comments on GitHub >

github_iconTop Results From Across the Web

import | Data Version Control - DVC
Provides an easy way to reuse files or directories tracked in any DVC repositoryDVC repository (e.g. datasets, intermediate results, ML models) or Git ......
Read more >
check licenses of dependencies · Issue #1115 · iterative/dvc
Most licenses are compatible. The only two at question are LGPL and docutils's GPL part. LGPL is permissive to link with. Python import...
Read more >
shcheklein/example-get-started: Get started DVC project
This is an auto-generated repository for use in DVC Get Started. It is a step-by-step quick introduction into basic DVC concepts.
Read more >
DVC imports authentication to blob storage - Stack Overflow
This happens since DVC is not using MLProject 's config when it clones and does dvc fetch in the DataProject2 during the import...
Read more >
DVC - Go Packages
dvc compare dvc compare -a dvc import dvc gen models -f -c dvc gen dals -f dvc gen interfaces dvc gen goperms dvc...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found