question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`_check_feature_names` raises UserWarning when accessing bagged estimators

See original GitHub issue

Describe the bug

Using the estimators inside a BaggingRegressor to predict data raises a

UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names

coming from _check_feature_names, even if the BaggingRegressor or its base estimator (in this case DecisionTreeRegressor) are able to take the feature names into account while fitting.

Steps/Code to Reproduce

from sklearn.ensemble import BaggingRegressor
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "feature_name": [-12.32, 1.43, 30.01, 22.17],
        "target": [72, 55, 32, 43],
    }
)
X = df[["feature_name"]]
y = df["target"]

bagged_trees = BaggingRegressor()
bagged_trees.fit(X, y)
bagged_trees_predictions = bagged_trees.predict(X) # rises no warning
bagged_trees.estimators_[0].predict(X) # rises UserWarning

Expected Results

No warning should be thrown

Actual Results

/home/arturoamor/miniforge3/envs/scikit-learn-course/lib/python3.9/site-packages/sklearn/base.py:438: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names warnings.warn( array([72., 55., 32., 32.])

Versions

System: python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) [GCC 9.3.0] executable: /home/arturoamor/miniforge3/envs/scikit-learn-course/bin/python machine: Linux-5.13.0-1017-oem-x86_64-with-glibc2.31

Python dependencies: pip: 21.1.3 setuptools: 49.6.0.post20210108 sklearn: 1.0.1 numpy: 1.21.0 scipy: 1.7.0 Cython: None pandas: 1.3.0 matplotlib: 3.4.2 joblib: 1.0.1 threadpoolctl: 2.1.0

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
jeremiedbbcommented, Nov 18, 2022

And now to 1.3 😃

1reaction
glemaitrecommented, Nov 23, 2021

Mentioning the problem to @ogrisel IRL, he was mentioning that this is still a bit annoying. This might not be a blocker for 1.0.2 thought but we could try to fix it.

We have a couple of things to have in mind here: the RandomForest (or other ensemble methods) does not want to validate the data for non-finite values for each underlying estimator because it will be too costly. The data validation thus make sense in the ensemble estimator indeed. However, we need to find a mechanism to go around the issue. One possibility would be to attach the metadata such as (n_features_in_, feature_names_, etc.) to each of the trees. We potentially need to be careful to the bootstrap sampling to have something that make sense.

Read more comments on GitHub >

github_iconTop Results From Across the Web

1.11. Ensemble methods — scikit-learn 1.2.0 documentation
In ensemble algorithms, bagging methods form a class of algorithms which build several instances of a black-box estimator on random subsets of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found