Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`_check_feature_names` raises UserWarning when accessing bagged estimators

See original GitHub issue

Describe the bug

Using the estimators inside a BaggingRegressor to predict data raises a

UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names

coming from _check_feature_names, even if the BaggingRegressor or its base estimator (in this case DecisionTreeRegressor) are able to take the feature names into account while fitting.

Steps/Code to Reproduce

from sklearn.ensemble import BaggingRegressor
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "feature_name": [-12.32, 1.43, 30.01, 22.17],
        "target": [72, 55, 32, 43],
    }
)
X = df[["feature_name"]]
y = df["target"]

bagged_trees = BaggingRegressor()
bagged_trees.fit(X, y)
bagged_trees_predictions = bagged_trees.predict(X) # rises no warning
bagged_trees.estimators_[0].predict(X) # rises UserWarning

Expected Results

No warning should be thrown

Actual Results

/home/arturoamor/miniforge3/envs/scikit-learn-course/lib/python3.9/site-packages/sklearn/base.py:438: UserWarning: X has feature names, but DecisionTreeRegressor was fitted without feature names warnings.warn( array([72., 55., 32., 32.])

Versions

System: python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) [GCC 9.3.0] executable: /home/arturoamor/miniforge3/envs/scikit-learn-course/bin/python machine: Linux-5.13.0-1017-oem-x86_64-with-glibc2.31

Python dependencies: pip: 21.1.3 setuptools: 49.6.0.post20210108 sklearn: 1.0.1 numpy: 1.21.0 scipy: 1.7.0 Cython: None pandas: 1.3.0 matplotlib: 3.4.2 joblib: 1.0.1 threadpoolctl: 2.1.0

Issue Analytics

State:
Created 2 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

2reactions

jeremiedbbcommented, Nov 18, 2022

And now to 1.3 😃

1reaction

glemaitrecommented, Nov 23, 2021

Mentioning the problem to @ogrisel IRL, he was mentioning that this is still a bit annoying. This might not be a blocker for 1.0.2 thought but we could try to fix it.

We have a couple of things to have in mind here: the RandomForest (or other ensemble methods) does not want to validate the data for non-finite values for each underlying estimator because it will be too costly. The data validation thus make sense in the ensemble estimator indeed. However, we need to find a mechanism to go around the issue. One possibility would be to attach the metadata such as (n_features_in_, feature_names_, etc.) to each of the trees. We potentially need to be careful to the bootstrap sampling to have something that make sense.