Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong estimator tags for LogisticRegression

See original GitHub issue

Describe the bug

The estimator tags for sklearn.linear_model.LogisticRegression are wrong because the _get_tags method uses the following resolution order for this estimator:

from sklearn.linear_model import LogisticRegression
import inspect
list(reversed(inspect.getmro(LogisticRegression)))
[<class 'object'>, <class 'sklearn.linear_model._base.SparseCoefMixin'>, <class 'sklearn.base.ClassifierMixin'>, <class 'sklearn.linear_model._base.LinearClassifierMixin'>, <class 'sklearn.base.BaseEstimator'>, <class 'sklearn.linear_model._logistic.LogisticRegression'>]

Thus, sklearn.base.BaseEstimator is replacing the estimator tags defined in sklearn.base.ClassifierMixin (e.g., requires_y).

Steps/Code to Reproduce

from sklearn.linear_model import LogisticRegression
LogisticRegression()._get_tags()
{'requires_y': False, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Expected Results

{'requires_y': True, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Actual Results

{'requires_y': False, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Versions

import sklearn; sklearn.show_versions()

System:
    python: 3.6.10 (default, Jun  9 2020, 18:45:00)  [GCC 8.3.0]
executable: /usr/local/bin/python
   machine: Linux-4.19.76-linuxkit-x86_64-with-debian-10.4

Python dependencies:
          pip: 20.1.1
   setuptools: 47.1.1
      sklearn: 0.24.dev0
        numpy: 1.18.4
        scipy: 1.4.1
       Cython: 0.29.18
       pandas: 1.0.3
   matplotlib: 3.2.1
       joblib: 0.15.1
threadpoolctl: 2.0.0

Built with OpenMP: True

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

rthcommented, Jul 5, 2020

better to do that in a separate PR.

@alfaro96 Including them in the PR that you just opened is also fine.

0reactions

alfaro96commented, Jul 5, 2020

A few more located with:

$ grep -rnw "sklearn" -e ".*(BaseEstimator,.*"
sklearn/metrics/_plot/tests/test_plot_precision_recall.py:66:    class MyClassifier(BaseEstimator, ClassifierMixin):
sklearn/ensemble/tests/test_stacking.py:265:class NoWeightRegressor(BaseEstimator, RegressorMixin):
sklearn/ensemble/tests/test_stacking.py:274:class NoWeightClassifier(BaseEstimator, ClassifierMixin):
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py:395:    class MinMaxImputer(BaseEstimator, TransformerMixin):
sklearn/multioutput.py:63:class _MultiOutputEstimator(BaseEstimator, MetaEstimatorMixin,
sklearn/model_selection/tests/test_search.py:1824:    class TestEstimator(BaseEstimator, ClassifierMixin):

After removing the false positives (e.g., import) and the already found ones.

Top Results From Across the Web

Binary logistic regression: Wrong labels for ... - Cross Validated

First, nothing is missing. One level is the reference category. There are different ways of parameterizing categorical variables in ...

Logistic Regression Model Tuning with scikit-learn — Part 1

A potential issue with this method would be the assumption that the label sizes represent ordinality (i.e. a label of 3 is greater...

sklearn.linear_model.LogisticRegression

Predict logarithm of probability estimates. The returned estimates for all classes are ordered by the label of classes. Parameters: Xarray-like of ...

12.1 - Logistic Regression | STAT 462

Logistic regression helps us estimate a probability of falling into a certain level of the categorical response given a set of predictors.

Lesson 3 Logistic Regression Diagnostics - OARC Stats

When the assumptions of logistic regression analysis are not met, we may have problems, such as biased coefficient estimates or very large standard...