question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AutoGluon stacking quality

See original GitHub issue

What is the reason why AutoGluon shows significantly worse result compare to simple one layer stacking? 5 hours best_quality training gives private score 0.38131 compare to 0.37353 for 2 hours logistic regression.

image

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Innixmacommented, May 19, 2021

Re 1: This already exists ('LR' is the key to use in hyperparameters), however it is not used by default. I plan to add it as a default model in a future release (possibly in v0.3).

Re 2: We have a more sophisticated method of handling bagging / CV than is available in sklearn. We have to handle more complex cases than is supported by CalibratedClassifierCV. In future we may consider contributing the functionality back to sklearn, as this is some of the most important components of AutoGluon.

Re 3: It is impossible to pick the best model ahead of time for the test score. best is picked off of the strongest val score. ML would be very easy if we could know which model was best on the test data ahead of time, but that is not the case.

Re 4: We are happy to accept contributions! If you’d like, please open a PR which adds this model and we can test it / benchmark it to see if it improves upon our existing methods!

1reaction
Innixmacommented, Apr 14, 2021

You can set to 1000 trees via the hyperparameters argument in .fit: https://auto.gluon.ai/stable/_modules/autogluon/tabular/predictor/predictor.html#TabularPredictor.fit

something like:

hyperparameters = {
    'NN': {},
    'GBM': [
        {},
        {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
        'GBMLarge',
    ],
    'CAT': {},
    'XGB': {},
    'FASTAI': {},
    'RF': [
        {'criterion': 'gini', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'entropy', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'mse', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
    ],
    'XT': [
        {'criterion': 'gini', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'entropy', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}},
        {'criterion': 'mse', 'n_estimators': 1000, 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression']}},
    ],
    'KNN': [
        {'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}},
        {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}},
    ],
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

AutoGluon stacking quality · Issue #1060 - GitHub
What is the reason why AutoGluon shows significantly worse result compare to simple one layer stacking? 5 hours best_quality training gives ...
Read more >
autogluon.tabular.models
This property allows for significantly improved model quality in many situations compared to non-stacking alternatives. Stacker models can act as base ...
Read more >
AutoGluon Tasks
Includes information on test and validation scores for all models, model training times, inference times, and stack levels. Output DataFrame columns include: ' ......
Read more >
Predicting Columns in a Table - In Depth - AutoGluon
Often stacking/bagging will produce superior accuracy than hyperparameter-tuning, but you may try combining both techniques (note: specifying presets=' ...
Read more >
AutoGluon Documentation 0.2.0 documentation
If stacker models are refit by this process, they will use the refit_full versions ... quality by including test data in predictor.leaderboard(test_data) ....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found