Add example YAML definitions to common dataset Python docstrings
See original GitHub issueDescription
Users often ask for an easy way to look up the relevant YAML dataset configuration for use in the DataCatalog would look like. We include a series of examples in this section of the documentation, but it’s not especially easy to (1) link to as these aren’t under headings (2) find via search engine for the same reason.
This would be useful for all datasets - but the highest priority are those which drive most traffic to our documentation website:
kedro.extras.datasets.pandas.CSVDataSet
kedro.extras.datasets.spark.SparkDataSet
kedro.io.PartitionedDataSet
kedro.extras.datasets.pandas.ParquetDataSet
kedro.extras.datasets.pickle.PickleDataSet
kedro.extras.datasets.pandas.ExcelDataSet
kedro.extras.datasets.pandas.SQLQueryDataSet
kedro.extras.datasets.pandas.GBQTableDataSet
kedro.extras.datasets.spark.SparkHiveDataSet
Possible Implementation
def __init__(
self,
filepath: str,
backend: str = "pickle",
load_args: Dict[str, Any] = None,
save_args: Dict[str, Any] = None,
version: Version = None,
credentials: Dict[str, Any] = None,
fs_args: Dict[str, Any] = None,
) -> None:
"""Creates a new instance of ``PickleDataSet`` pointing to a concrete Pickle
file on a specific filesystem. ``PickleDataSet`` supports four backends to
serialize/deserialize objects: `pickle`, `joblib`, `dill`, and `compress_pickle`.
Example YAML data catalog entry:
>>> airplanes:
>>> type: pickle.PickleDataSet
>>> filepath: data/06_models/airplanes.pkl
>>> backend: pickle
Args:
filepath: Filepath in POSIX format to a Pickle file prefixed with a protocol like
`s3://`. If prefix is not provided, `file` protocol (local filesystem) will be used.
The prefix should be any protocol supported by ``fsspec``.
Note: `http(s)` doesn't support versioning.
backend: Backend to use, must be one of ['pickle', 'joblib', 'dill', 'compress_pickle'].
Defaults to 'pickle'.
"""
To accomplish this, do the following:
- Follow the process in our Contribution Guide
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Include yaml examples in dataset docs #579 - GitHub
Description The most common way I look up the docs for a DataSet is to google ... Add example YAML definitions to common...
Read more >Adding Structured Data to Docstrings | Biopragmatics
Its documentation uses the sphinx-automodapi extension to generate pretty lists of all the datasets, models, loss functions, regularizers, etc.
Read more >Documenting Python APIs with Docstrings
We use Python docstrings to create reference documentation for our Python APIs. ... This technique is useful for the Notes and Examples Numpydoc...
Read more >Python Docstrings Tutorial : Examples & Format for Pydoc ...
See Python Docstrings. Learn about the different types of docstrings & various docstring formats like Sphinx, Numpy, and Pydoc with examples now.
Read more >Documenting a Python package with mkdocs-material
Lets say our fictitious “my-package” Python package has the following structure, and we want to add the code reference for the “workflow” module....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@avan-sh I’ve also hacked something together that might help you draft these, hopefully useful!:
Which produces:
Took a stab at this for CSV dataset, taking some inspiration from the old discussion. The new docs would look as below, hyperlinked to.
Would love to hear any suggestions for changes before I add it to other datasets.