Should `kedro-datasets` be a namespace package?
See original GitHub issue@noklam @MerelTheisenQB Bit late to the party, but… did you all consider/discuss keeping the import path as is? kedro-datasets
would just expose the kedro.extras.datasets
namespace package, and kedro
would ideally exclude that namespace (even though it will probably still work, which is pretty neat from a compatibility perspective). This would mean that the change would largely be transparent to the user.
_Originally posted by @deepyaman in https://github.com/kedro-org/kedro-plugins/pull/38#discussion_r907477334_
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Packaging namespace packages
Namespace packages can be useful for a large collection of loosely-related packages (such as a large corpus of client libraries for multiple products...
Read more >Package `kedro.extras.datasets` into its own `kedro ... - GitHub
Deepyaman has an idea with namespace package (i.e. we keep the namespace kedro.extra.datasets in the kedro-datasets repo.
Read more >Modular pipelines — Kedro 0.18.4 documentation
Modular pipelines should not depend on the main Python package, as this would break ... The namespace that will be encapsulated by this...
Read more >python - Namespace vs regular package - Stack Overflow
Namespace packages. As of Python 3.3, we get namespace packages. These are a special kind of package that allows you to unify two...
Read more >Python Namespace Packages - YouTube
python #setuptools #packaging #namespacing #django #papermergeIn this screencast we will learn about namespace packaging and few caveats ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@noklam as I understand it, the import path as it stands is
kedro_datasets.datasets.pandas.CSVDataSet
. Is there any reason for that rather thankedro_datasets.pandas.CSVDataSet
?Notes from Technical Design session:
Discussion
Pros:
import ... from kedro
and notimport .. from kedro
andimport ... from kedro_datasets
kedro.extras.datasets
Cons:
kedro-datasets
separately from Kedro, it might be a bad idea to package it under the “kedro” namespace. However, it’s only a super small use case andAbstractDataSet
is and will remain a part of core Kedro so it’s a very unlikely use case.Other concerns and questions raised:
kedro.extras.datasets
tokedro.datasets
andDataSet
toDataset
? Yes, but not straight away. Both of these are breaking changes and should only be done when Kedro0.19.0
will be released. Alternatively, we could alias the datasets which could be a fun exercise. It’s still a question of whether it’s worth the effort.kedro-datasets
throughimport ... from kedro
, but need to install a dataset withpip install kedro-datasets[xxx]
and notpip install kedro[xxx]
. We should add a redirect to make this possiblekedro
version to thekedro-datasets
version, e.g. when a user doespip install kedro[someDataSet]
which version do we install?kedro-datasets
should be a dependency insidekedro
with a strict bound. So when a user is usingkedro
0.18.x that will be mapped tokedro-datasets
version x.x.x If a user then wants a newer dataset version they can install it manually (and accept the risks that this might break something that they need to fix themselves).Conclusion
The team reached a consensus to namespace the
kedro-datasets
package under the kedro namespace.Implementation
setup.py
: https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packageskedro.extras.datasets
tokedro.datasets
and fromDataSet
toDataset
will be done when Kedro0.19.0
is going to be released.