check_is_fitted has false positives on custom subclasses with private attributes
See original GitHub issueDescription
check_is_fitted
has false positives on custom subclasses with private attributes.
I believe check_is_fitted
should not look at variables with a leading underscore because 1) that is Python’s convention for private attributes, and 2) the scikit-learn API specifies fitted attributes as those with a trailing underscore (and there is no specification regarding leading underscores, correct?).
Backtracking from PR #14545 where the new check logic was introduced, I noticed that the check for leading underscore was added to cover these two modules:
sklearn/neighbors/base.py
sklearn/neighbors/lof.py
But perhaps these modules are actually not following the API specification?
Steps/Code to Reproduce
class MyPCA(PCA):
def __init__(self, ...): # omitted arguments for brevity
super().__init__(...)
self._my_private_attr = 42
mypca = MyPCA()
check_is_fitted(mypca) # does not raise NotFittedError
Expected Results
check_is_fitted
raises NotFittedError
even on custom subclasses that have private attributes following the Python convention of a leading underscore.
Actual Results
NotFittedError
is not raised.
Versions
>>> import sklearn; sklearn.show_versions()
System:
python: 3.7.3 (default, Aug 8 2019, 19:40:58) [GCC 5.4.0 20160609]
executable: /media/ale/data/education+research/code/baikal/venv/bin/python
machine: Linux-4.4.0-170-generic-x86_64-with-debian-stretch-sid
Python dependencies:
pip: 19.3.1
setuptools: 40.8.0
sklearn: 0.22
numpy: 1.17.4
scipy: 1.3.3
Cython: None
pandas: None
matplotlib: None
joblib: 0.14.0
Built with OpenMP: True
Issue Analytics
- State:
- Created 4 years ago
- Comments:37 (28 by maintainers)
Fwiw I am okay with removing the attr.startswith(“_”) condition. For the sake of the future I think removing it is the better option. But because the intention is for the estimator to use the function internally, it doesn’t matter too much.
Our convention with underscores is fairly implicit. I would prefer a more explicit list of attributes that define all the attributes that will be learnt during training, thus allowing private attributes to be defined in init.
check_is_fitted
will only check forfitted_attributes
. (Yes this kind of goes back to the former version ofcheck_is_fitted
)I have seen this issue come up in
skorch
as well, i.e. this change incheck_is_fitted
is kind of breaking third party estimators.