question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

check_is_fitted has false positives on custom subclasses with private attributes

See original GitHub issue

Description

check_is_fitted has false positives on custom subclasses with private attributes.

I believe check_is_fitted should not look at variables with a leading underscore because 1) that is Python’s convention for private attributes, and 2) the scikit-learn API specifies fitted attributes as those with a trailing underscore (and there is no specification regarding leading underscores, correct?).

Backtracking from PR #14545 where the new check logic was introduced, I noticed that the check for leading underscore was added to cover these two modules:

  • sklearn/neighbors/base.py
  • sklearn/neighbors/lof.py

But perhaps these modules are actually not following the API specification?

Steps/Code to Reproduce

class MyPCA(PCA):
    def __init__(self, ...):  # omitted arguments for brevity
        super().__init__(...)
        self._my_private_attr = 42

mypca = MyPCA()
check_is_fitted(mypca)  # does not raise NotFittedError

Expected Results

check_is_fitted raises NotFittedError even on custom subclasses that have private attributes following the Python convention of a leading underscore.

Actual Results

NotFittedError is not raised.

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.7.3 (default, Aug  8 2019, 19:40:58)  [GCC 5.4.0 20160609]
executable: /media/ale/data/education+research/code/baikal/venv/bin/python
   machine: Linux-4.4.0-170-generic-x86_64-with-debian-stretch-sid

Python dependencies:
       pip: 19.3.1
setuptools: 40.8.0
   sklearn: 0.22
     numpy: 1.17.4
     scipy: 1.3.3
    Cython: None
    pandas: None
matplotlib: None
    joblib: 0.14.0

Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:37 (28 by maintainers)

github_iconTop GitHub Comments

2reactions
jnothmancommented, Dec 22, 2019

Fwiw I am okay with removing the attr.startswith(“_”) condition. For the sake of the future I think removing it is the better option. But because the intention is for the estimator to use the function internally, it doesn’t matter too much.

2reactions
thomasjpfancommented, Dec 13, 2019

Our convention with underscores is fairly implicit. I would prefer a more explicit list of attributes that define all the attributes that will be learnt during training, thus allowing private attributes to be defined in init.

class MyPCA:
    fitted_attributes = ['n_components_']

    def __init__(self, ...):
       self._private_attr = 42

    def fit(self,...):
        self.n_components_ = 4
	    self._learned_attr = 10

check_is_fitted will only check for fitted_attributes. (Yes this kind of goes back to the former version of check_is_fitted)

I have seen this issue come up in skorch as well, i.e. this change in check_is_fitted is kind of breaking third party estimators.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.utils.validation.check_is_fitted
Perform is_fitted validation for estimator. Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore)...
Read more >
Subclassing the Scikit-Learn Pipeline | Analytics Vidhya
If you're hoping to access data that exists in a “fitted” (so weird) ColumnTransformer, then you need the underscored attribute. Setting custom ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found