Modularize default argument handling for datasets
See original GitHub issueDescription
Near-identical code to handle default arguments is replicated in almost every dataset implementation. Worse still, functionality across said datasets is the same, but implementation is inconsistent.
Context
When I want to implement a new dataset, I look at existing datasets as a baseline for implementing my own. However, there are inconsistencies between these datasets, from the more minor (save_args
handled after load_args
for some datasets), to the slightly more significant (special casing where there are no default arguments on some datasets but not others) and worse (one case where arguments are evaluated for truthiness instead of is not None
) (see https://github.com/quantumblacklabs/kedro/blob/0.14.1/kedro/contrib/io/azure/csv_blob.py#L109-L113 as an example representing several of the above). I don’t know which one to follow to maintain consistency across the codebase.
Possible Implementation
By having DEFAULT_LOAD_ARGS
/DEFAULT_SAVE_ARGS
attributes, users can also see the defaults programmatically (with the caveat that this is a drawback if you consider the few cases where such arguments don’t apply, like no save on SqlQueryDataSet
or in general on LambdaDataSet
/MemoryDataSet
).
Possible Alternatives
- Create an intermediate abstract dataset class (or mixin?) so as to not modify
AbstractDataSet
and thereby only apply to those withload_args
/save_args
- Move default argument handling into a utility function and call it from each individual
__init__
method (not preferred)
Checklist
Include labels so that we can categorise your issue:
- Add a “Component” label to the issue
- Add a “Priority” label to the issue
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:9 (9 by maintainers)
Top GitHub Comments
Hi @deepyaman , thanks a lot for pointing this out. It’s something I 've thought of proposing to fix quite a few times but was never a priority.
Another solution to consider would be to have
default_save_args
anddefault_load_args
as class variables (as this is what they really are): (edit: This is actually what you did as well, sorry, somehow I missed that 🤦♂ )So that:
This would avoid the
__init__
on the parent class, remove thepylint: disable=super-init-not-called
or simplify the code for classes that don’t make use of defaults.I also like you proposition of making the
default_*_args
a public attribute, its kind of hidden in the constructor atm, so less “magic” for our usersBtw, I m not sure if what I suggested is the right way Maybe wait and see what others also say? @idanov @tolomea
Thank you so much for this @deepyaman! We’ll await feedback from @idanov on this and will get back to you.