question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoMLSearch fails with Ordinal logical type input from Featuretools

See original GitHub issue

AutoMLSearch fails if the input contains Ordinal data from Featuretools, such as that generated by the Year, Month, etc primitives.

Code Sample, a copy-pastable example to reproduce your bug.

import featuretools as ft
from evalml import AutoMLSearch
import pandas as pd

df = pd.read_csv("delhi_200.csv")

es = ft.EntitySet()
es.add_dataframe(dataframe_name="df", dataframe=df, index="id", make_index=True, time_index="date")
es["df"].ww

trans_primitives = ["day"]
features = ft.dfs(entityset=es,
                  target_dataframe_name="df",
                  max_depth=1,
                  features_only=True,
                  trans_primitives=trans_primitives)
features.append(ft.Feature(es["df"].ww["date"]))
fm = ft.calculate_feature_matrix(entityset=es, features=features)
y = fm.ww.pop("meantemp")
X = fm

problem_configuration={"gap": 0, "max_delay": 7, "forecast_horizon": 7, "time_index": "date"}
automl = AutoMLSearch(
    X,
    y,
    problem_type="time series regression",
    problem_configuration=problem_configuration,
)

automl.search()

Random Forest Regressor w/ Replace Nullable Types Transformer + Imputer + Time Series Featurizer + DateTime Featurizer + One Hot Encoder + Drop NaN Rows Transformer fold 0: Encountered an error.
Random Forest Regressor w/ Replace Nullable Types Transformer + Imputer + Time Series Featurizer + DateTime Featurizer + One Hot Encoder + Drop NaN Rows Transformer fold 0: All scores will be replaced with nan.
Fold 0: Exception during automl search: Input contains NaN

...

AutoMLSearchException: All pipelines in the current AutoML batch produced a score of np.nan on the primary objective <evalml.objectives.standard_metrics.MedianAE object at 0x2898447c0>.

delhi_200.csv

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
tamargreycommented, Nov 2, 2022

@tamargrey Still trying to digest this information just a but, but how can we do a categorical to double conversion reliably since categories don’t have to be numeric?

@thehomebrewnerd I believe it happens via _encode_X_while_preserving_index, which will turn all the categories into numbers (here).

But the fact that _get_categorical_columns ignores Ordinal and other logical types with category standard tags means that those columns wouldn’t get ordinally encoded and if the data wasn’t already numeric in nature, we will have problems with any non numeric ordinal or category feature. It should be a quick fix, but I would want to talk to other folks on the modeling team before making this change.

0reactions
thehomebrewnerdcommented, Nov 2, 2022

@tamargrey Still trying to digest this information just a but, but how can we do a categorical to double conversion reliably since categories don’t have to be numeric?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ordinal Logical Type cause error to EvalML · Issue #2456
I got an error when using Ordinal Logical Type with EvalML, ... Fold 0: Exception during automl search: Must use an Ordinal instance...
Read more >
Release Notes — EvalML 0.64.0 documentation - Alteryx
AutoMLSearch will set use_covariates to False for ARIMA when dataset is large #3407. Add ability to retrieve logical types to a component in...
Read more >
Titanic with FeatureTools
Explore and run machine learning code with Kaggle Notebooks | Using data from Titanic - Machine Learning from Disaster.
Read more >
Automated Machine Learning: State-of-The-Art and Open ...
In this paper, we present a comprehensive survey for the state-of-the-art efforts in tackling the CASH problem. In addition, we highlight ...
Read more >
Automated Machine Learning: State-of-The-Art and Open ...
warm starting (meta-learning) for AutoML search problem ... ing various learning algorithms on different kinds of data,.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found