question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong estimator tags for LogisticRegression

See original GitHub issue

Describe the bug

The estimator tags for sklearn.linear_model.LogisticRegression are wrong because the _get_tags method uses the following resolution order for this estimator:

from sklearn.linear_model import LogisticRegression
import inspect
list(reversed(inspect.getmro(LogisticRegression)))
[<class 'object'>, <class 'sklearn.linear_model._base.SparseCoefMixin'>, <class 'sklearn.base.ClassifierMixin'>, <class 'sklearn.linear_model._base.LinearClassifierMixin'>, <class 'sklearn.base.BaseEstimator'>, <class 'sklearn.linear_model._logistic.LogisticRegression'>]

Thus, sklearn.base.BaseEstimator is replacing the estimator tags defined in sklearn.base.ClassifierMixin (e.g., requires_y).

Steps/Code to Reproduce

from sklearn.linear_model import LogisticRegression
LogisticRegression()._get_tags()
{'requires_y': False, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Expected Results

{'requires_y': True, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Actual Results

{'requires_y': False, 'non_deterministic': False, 'requires_positive_X': False, 'requires_positive_y': False, 'X_types': ['2darray'], 'poor_score': False, 'no_validation': False, 'multioutput': False, 'allow_nan': False, 'stateless': False, 'multilabel': False, '_skip_test': False, '_xfail_checks': False, 'multioutput_only': False, 'binary_only': False, 'requires_fit': True}

Versions

import sklearn; sklearn.show_versions()

System:
    python: 3.6.10 (default, Jun  9 2020, 18:45:00)  [GCC 8.3.0]
executable: /usr/local/bin/python
   machine: Linux-4.19.76-linuxkit-x86_64-with-debian-10.4

Python dependencies:
          pip: 20.1.1
   setuptools: 47.1.1
      sklearn: 0.24.dev0
        numpy: 1.18.4
        scipy: 1.4.1
       Cython: 0.29.18
       pandas: 1.0.3
   matplotlib: 3.2.1
       joblib: 0.15.1
threadpoolctl: 2.0.0

Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rthcommented, Jul 5, 2020

better to do that in a separate PR.

@alfaro96 Including them in the PR that you just opened is also fine.

0reactions
alfaro96commented, Jul 5, 2020

A few more located with:

$ grep -rnw "sklearn" -e ".*(BaseEstimator,.*"
sklearn/metrics/_plot/tests/test_plot_precision_recall.py:66:    class MyClassifier(BaseEstimator, ClassifierMixin):
sklearn/ensemble/tests/test_stacking.py:265:class NoWeightRegressor(BaseEstimator, RegressorMixin):
sklearn/ensemble/tests/test_stacking.py:274:class NoWeightClassifier(BaseEstimator, ClassifierMixin):
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py:395:    class MinMaxImputer(BaseEstimator, TransformerMixin):
sklearn/multioutput.py:63:class _MultiOutputEstimator(BaseEstimator, MetaEstimatorMixin,
sklearn/model_selection/tests/test_search.py:1824:    class TestEstimator(BaseEstimator, ClassifierMixin):

After removing the false positives (e.g., import) and the already found ones.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Binary logistic regression: Wrong labels for ... - Cross Validated
First, nothing is missing. One level is the reference category. There are different ways of parameterizing categorical variables in ...
Read more >
Logistic Regression Model Tuning with scikit-learn — Part 1
A potential issue with this method would be the assumption that the label sizes represent ordinality (i.e. a label of 3 is greater...
Read more >
sklearn.linear_model.LogisticRegression
Predict logarithm of probability estimates. The returned estimates for all classes are ordered by the label of classes. Parameters: Xarray-like of ...
Read more >
12.1 - Logistic Regression | STAT 462
Logistic regression helps us estimate a probability of falling into a certain level of the categorical response given a set of predictors.
Read more >
Lesson 3 Logistic Regression Diagnostics - OARC Stats
When the assumptions of logistic regression analysis are not met, we may have problems, such as biased coefficient estimates or very large standard...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found