Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoGluon stacking quality

See original GitHub issue

What is the reason why AutoGluon shows significantly worse result compare to simple one layer stacking? 5 hours best_quality training gives private score 0.38131 compare to 0.37353 for 2 hours logistic regression.

Issue Analytics

State:
Created 2 years ago
Comments:13 (1 by maintainers)

Top GitHub Comments

1reaction

Innixmacommented, May 19, 2021

Re 1: This already exists ('LR' is the key to use in hyperparameters), however it is not used by default. I plan to add it as a default model in a future release (possibly in v0.3).

Re 2: We have a more sophisticated method of handling bagging / CV than is available in sklearn. We have to handle more complex cases than is supported by CalibratedClassifierCV. In future we may consider contributing the functionality back to sklearn, as this is some of the most important components of AutoGluon.

Re 3: It is impossible to pick the best model ahead of time for the test score. best is picked off of the strongest val score. ML would be very easy if we could know which model was best on the test data ahead of time, but that is not the case.

Re 4: We are happy to accept contributions! If you’d like, please open a PR which adds this model and we can test it / benchmark it to see if it improves upon our existing methods!

1reaction

Innixmacommented, Apr 14, 2021

You can set to 1000 trees via the hyperparameters argument in .fit: https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit

something like:

hyperparameters = {
    'NN': {},
    'GBM': [
        {},
        {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
        'GBMLarge',
    ],
    'CAT': {},
    'XGB': {},
    'FASTAI': {},
    'RF': [
        {'criterion': 'gini', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'entropy', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'mse', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
    ],
    'XT': [
        {'criterion': 'gini', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'entropy', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'mse', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
    ],
    'KNN': [
        {'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}},
        {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}},
    ],
}

Top Results From Across the Web

AutoGluon stacking quality · Issue #1060 - GitHub

What is the reason why AutoGluon shows significantly worse result compare to simple one layer stacking? 5 hours best_quality training gives ...

autogluon.tabular.models

This property allows for significantly improved model quality in many situations compared to non-stacking alternatives. Stacker models can act as base ...

AutoGluon Tasks

Includes information on test and validation scores for all models, model training times, inference times, and stack levels. Output DataFrame columns include: ' ......

Predicting Columns in a Table - In Depth - AutoGluon

Often stacking/bagging will produce superior accuracy than hyperparameter-tuning, but you may try combining both techniques (note: specifying presets=' ...

AutoGluon Documentation 0.2.0 documentation

If stacker models are refit by this process, they will use the refit_full versions ... quality by including test data in predictor.leaderboard(test_data) ....