question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Should `kedro-datasets` be a namespace package?

See original GitHub issue

@noklam @MerelTheisenQB Bit late to the party, but… did you all consider/discuss keeping the import path as is? kedro-datasets would just expose the kedro.extras.datasets namespace package, and kedro would ideally exclude that namespace (even though it will probably still work, which is pretty neat from a compatibility perspective). This would mean that the change would largely be transparent to the user.

_Originally posted by @deepyaman in https://github.com/kedro-org/kedro-plugins/pull/38#discussion_r907477334_

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
AntonyMilneQBcommented, Jun 29, 2022

@noklam as I understand it, the import path as it stands is kedro_datasets.datasets.pandas.CSVDataSet. Is there any reason for that rather than kedro_datasets.pandas.CSVDataSet?

2reactions
merelchtcommented, Jul 6, 2022

Notes from Technical Design session:

Discussion

Pros:

  • Namespacing the package would make the experience of using kedro and datasets better for users, because they would only need to do import ... from kedro and not import .. from kedro and import ... from kedro_datasets
  • We can make the move to the new repo non-breaking this way by keeping the import to kedro.extras.datasets

Cons:

  • This could make our CI/CD setup more complicated
  • If it should be possible to use kedro-datasets separately from Kedro, it might be a bad idea to package it under the “kedro” namespace. However, it’s only a super small use case and AbstractDataSet is and will remain a part of core Kedro so it’s a very unlikely use case.

Other concerns and questions raised:

  • Can users still use the standalone datacatalog if we namespace kedro-datasets? Yes, because in order to use the datacatalog they need Kedro anyway, so this won’t change anything to that workflow.
  • Should we rename kedro.extras.datasets to kedro.datasets and DataSet to Dataset? Yes, but not straight away. Both of these are breaking changes and should only be done when Kedro 0.19.0 will be released. Alternatively, we could alias the datasets which could be a fun exercise. It’s still a question of whether it’s worth the effort.
  • It might be confusing for users that they can import kedro-datasets through import ... from kedro, but need to install a dataset with pip install kedro-datasets[xxx] and not pip install kedro[xxx]. We should add a redirect to make this possible
  • How do we map the kedro version to the kedro-datasets version, e.g. when a user does pip install kedro[someDataSet] which version do we install? kedro-datasets should be a dependency inside kedro with a strict bound. So when a user is using kedro 0.18.x that will be mapped to kedro-datasets version x.x.x If a user then wants a newer dataset version they can install it manually (and accept the risks that this might break something that they need to fix themselves).

Conclusion

The team reached a consensus to namespace the kedro-datasets package under the kedro namespace.

Implementation

Read more comments on GitHub >

github_iconTop Results From Across the Web

Packaging namespace packages
Namespace packages can be useful for a large collection of loosely-related packages (such as a large corpus of client libraries for multiple products...
Read more >
Package `kedro.extras.datasets` into its own `kedro ... - GitHub
Deepyaman has an idea with namespace package (i.e. we keep the namespace kedro.extra.datasets in the kedro-datasets repo.
Read more >
Modular pipelines — Kedro 0.18.4 documentation
Modular pipelines should not depend on the main Python package, as this would break ... The namespace that will be encapsulated by this...
Read more >
python - Namespace vs regular package - Stack Overflow
Namespace packages. As of Python 3.3, we get namespace packages. These are a special kind of package that allows you to unify two...
Read more >
Python Namespace Packages - YouTube
python #setuptools #packaging #namespacing #django #papermergeIn this screencast we will learn about namespace packaging and few caveats ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found