question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[DOC] Completing estimator class docstrings

See original GitHub issue

Every estimator class should have a complete docstring. This should be worked on one-by-one, and feel free to complete only individual rubrics if it’s unclear what to fill in for the others.

A good estimator docstring should include rubrics:

  • one-liner description (top), start capitalized, end with .
  • description paragraph - what is the algorithm?
  • Components block - only if there are estimator components. The list of components should be identical with constructor arguments that are estimators (inheriting from BaseClassifier, BaseForecaster, etc).
  • Parameters block - individual parameters listed with param_name: type, explanation, explanation should include value/structure convention if expectation is more specific than just stating the type, e.g., n: int, integer between 0 and 42. The list of Parameters should be identical with constructor arguments that are not estimators.
  • Attributes block - these are the most important attributes of object instances which are not parameters or components. It should include attributes that correspond to the “fitted model”.
  • Notes - details, formulae, academic references
  • Example - self-contained example on sktime internal toy data that runs

For formatting, we use the numpy style, though note that the rubrics are slightly different (because we are dealing with algorithms/estimators). Also look at the extension templates for the algorithm scitype for a “fill-in template” that algorithm implementers are using (or should be using).

Here’s an example of a good class docstring:

class BOSSEnsemble(BaseClassifier):
    """Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).

    Implementation of BOSS Ensemble from Schäfer (2015). [1]_

    Overview: Input "n" series of length "m" and BOSS performs a grid search over
    a set of parameter values, evaluating each with a LOOCV. It then retains
    all ensemble members within 92% of the best by default for use in the ensmeble.
    There are three primary parameters:
        - alpha: alphabet size
        - w: window length
        - l: word length.

    For any combination, a single BOSS slides a window length "w" along the
    series. The w length window is shortened to an "l" length word through
    taking a Fourier transform and keeping the first l/2 complex coefficients.
    These "l" coefficients are then discretized into alpha possible values,
    to form a word length "l". A histogram of words for each
    series is formed and stored.

    Fit involves finding "n" histograms.

    Predict uses 1 nearest neighbor with a bespoke BOSS distance function.

    Parameters
    ----------
    threshold : float, default=0.92
        Threshold used to determine which classifiers to retain. All classifiers
        within percentage `threshold` of the best one are retained.
    max_ensemble_size : int or None, default=500
        Maximum number of classifiers to retain. Will limit number of retained
        classifiers even if more than `max_ensemble_size` are within threshold.
    max_win_len_prop : int or float, default=1
        Maximum window length as a proportion of the series length.
    min_window : int, default=10
        Minimum window size.
    n_jobs : int, default=1
        The number of jobs to run in parallel for both `fit` and `predict`.
        ``-1`` means using all processors.
    random_state : int or None, default=None
        Seed for random, integer.

    Attributes
    ----------
    n_classes : int
        Number of classes. Extracted from the data.
    n_instances : int
        Number of instances. Extracted from the data.
    n_estimators : int
        The final number of classifiers used. Will be <= `max_ensemble_size` if
        `max_ensemble_size` has been specified.
    series_length : int
        Length of all series (assumed equal).
    classifiers : list
       List of DecisionTree classifiers.

    See Also
    --------
    IndividualBOSS, ContractableBOSS

    Notes
    -------
    For the Java version, see
    `TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
    main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.

    References
    ----------
    .. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
       in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
       https://link.springer.com/article/10.1007/s10618-014-0377-7

    Example
    -------
    >>> from sktime.classification.dictionary_based import BOSSEnsemble
    >>> from sktime.datasets import load_italy_power_demand
    >>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
    >>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
    >>> clf = BOSSEnsemble()
    >>> clf.fit(X_train, y_train)
    BOSSEnsemble(...)
    >>> y_pred = clf.predict(X_test)
    """

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7

github_iconTop GitHub Comments

2reactions
RNKuhnscommented, Jul 20, 2021

@mloning and @fkiraly, I’ve got an update to the docstring above that fixes some formatting issues that won’t render well in Sphinx (e.g. some of the formatting of parameters and attribute sections). Note that this moves the reference to the paper to the references section as specified in NumPy docstring format. Moved the reference to the Java version to the See Also section, but still need to figure out how to make the link work correctly there.

This also cleans up some typos, capitalization issues aand other minor things.

class BOSSEnsemble(BaseClassifier):
    """Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).

    Implementation of BOSS Ensemble from Schäfer (2015). [1]_

    Overview: Input "n" series of length "m" and BOSS performs a grid search over
    a set of parameter values, evaluating each with a LOOCV. It then retains
    all ensemble members within 92% of the best by default for use in the ensmeble.
    There are three primary parameters:
        - alpha: alphabet size
        - w: window length
        - l: word length.

    For any combination, a single BOSS slides a window length "w" along the
    series. The w length window is shortened to an "l" length word through
    taking a Fourier transform and keeping the first l/2 complex coefficients.
    These "l" coefficients are then discretized into alpha possible values,
    to form a word length "l". A histogram of words for each
    series is formed and stored.

    Fit involves finding "n" histograms.

    Predict uses 1 nearest neighbor with a bespoke BOSS distance function.

    Parameters
    ----------
    threshold : float, default=0.92
        Threshold used to determine which classifiers to retain. All classifiers
        within percentage `threshold` of the best one are retained.
    max_ensemble_size : int or None, default=500
        Maximum number of classifiers to retain. Will limit number of retained
        classifiers even if more than `max_ensemble_size` are within threshold.
    max_win_len_prop : int or float, default=1
        Maximum window length as a proportion of the series length.
    min_window : int, default=10
        Minimum window size.
    n_jobs : int, default=1
        The number of jobs to run in parallel for both `fit` and `predict`.
        ``-1`` means using all processors.
    random_state : int or None, default=None
        Seed for random, integer.

    Attributes
    ----------
    n_classes : int
        Number of classes. Extracted from the data.
    n_instances : int
        Number of instances. Extracted from the data.
    n_estimators : int
        The final number of classifiers used. Will be <= `max_ensemble_size` if
        `max_ensemble_size` has been specified.
    series_length : int
        Length of all series (assumed equal).
    classifiers : list
       List of DecisionTree classifiers.

    See Also
    --------
    :py:class:`IndividualBOSS`, :py:class:`ContractableBOSS`

    For the Java version, see
    `TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
    main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.

    References
    ----------
    .. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
       in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
       https://link.springer.com/article/10.1007/s10618-014-0377-7

    Example
    -------
    >>> from sktime.classification.dictionary_based import BOSSEnsemble
    >>> from sktime.datasets import load_italy_power_demand
    >>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
    >>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
    >>> clf = BOSSEnsemble()
    >>> clf.fit(X_train, y_train)
    BOSSEnsemble(...)
    >>> y_pred = clf.predict(X_test)
    """
1reaction
RNKuhnscommented, Jul 18, 2021

Was able to get with the Scipy documentation sprint and figure out how to specify the link. I’ve updated both posts to include the correct link usage in “See Also”

Read more comments on GitHub >

github_iconTop Results From Across the Web

[DOC] Completing estimator class docstrings · Issue #1148
Every estimator class should have a complete docstring. This should be worked on one-by-one, and feel free to complete only individual ...
Read more >
Creating Python Function docstrings and Running doctests
What information goes into a Python function docstring ? ... The complete course can be found here: https://realpython.com/ courses /buildi.
Read more >
Python Docstrings (With Examples)
The docstrings for classes should summarize its behavior and list the public methods and instance variables. The subclasses, constructors, and methods should ...
Read more >
Is it bad practice to dynamically generate documentation in ...
Say I have a class with functions: class Foo: def one(): return 1 def two(): return 2 def three(): return 3 def four():...
Read more >
Adding Module Docstrings
00:32 The docstring of calculations.py should give a quick overview of the module and then list all the functions that it exports, together...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found