AutoMLSearch fails with Ordinal logical type input from Featuretools
See original GitHub issueAutoMLSearch fails if the input contains Ordinal data from Featuretools, such as that generated by the Year
, Month
, etc primitives.
Code Sample, a copy-pastable example to reproduce your bug.
import featuretools as ft
from evalml import AutoMLSearch
import pandas as pd
df = pd.read_csv("delhi_200.csv")
es = ft.EntitySet()
es.add_dataframe(dataframe_name="df", dataframe=df, index="id", make_index=True, time_index="date")
es["df"].ww
trans_primitives = ["day"]
features = ft.dfs(entityset=es,
target_dataframe_name="df",
max_depth=1,
features_only=True,
trans_primitives=trans_primitives)
features.append(ft.Feature(es["df"].ww["date"]))
fm = ft.calculate_feature_matrix(entityset=es, features=features)
y = fm.ww.pop("meantemp")
X = fm
problem_configuration={"gap": 0, "max_delay": 7, "forecast_horizon": 7, "time_index": "date"}
automl = AutoMLSearch(
X,
y,
problem_type="time series regression",
problem_configuration=problem_configuration,
)
automl.search()
Random Forest Regressor w/ Replace Nullable Types Transformer + Imputer + Time Series Featurizer + DateTime Featurizer + One Hot Encoder + Drop NaN Rows Transformer fold 0: Encountered an error.
Random Forest Regressor w/ Replace Nullable Types Transformer + Imputer + Time Series Featurizer + DateTime Featurizer + One Hot Encoder + Drop NaN Rows Transformer fold 0: All scores will be replaced with nan.
Fold 0: Exception during automl search: Input contains NaN
...
AutoMLSearchException: All pipelines in the current AutoML batch produced a score of np.nan on the primary objective <evalml.objectives.standard_metrics.MedianAE object at 0x2898447c0>.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Ordinal Logical Type cause error to EvalML · Issue #2456
I got an error when using Ordinal Logical Type with EvalML, ... Fold 0: Exception during automl search: Must use an Ordinal instance...
Read more >Release Notes — EvalML 0.64.0 documentation - Alteryx
AutoMLSearch will set use_covariates to False for ARIMA when dataset is large #3407. Add ability to retrieve logical types to a component in...
Read more >Titanic with FeatureTools
Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster.
Read more >Automated Machine Learning: State-of-The-Art and Open ...
In this paper, we present a comprehensive survey for the state-of-the-art efforts in tackling the CASH problem. In addition, we highlight ...
Read more >Automated Machine Learning: State-of-The-Art and Open ...
warm starting (meta-learning) for AutoML search problem ... ing various learning algorithms on different kinds of data,.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@thehomebrewnerd I believe it happens via
_encode_X_while_preserving_index
, which will turn all the categories into numbers (here).But the fact that
_get_categorical_columns
ignores Ordinal and other logical types withcategory
standard tags means that those columns wouldn’t get ordinally encoded and if the data wasn’t already numeric in nature, we will have problems with any non numeric ordinal or category feature. It should be a quick fix, but I would want to talk to other folks on the modeling team before making this change.@tamargrey Still trying to digest this information just a but, but how can we do a categorical to double conversion reliably since categories don’t have to be numeric?