`NeuralNetClassifier` took too long to train
See original GitHub issuesometimes the NeuralNetClassifier
took way long time than others. for example, here is an example to train on a small scale dataset
!kaggle d download mlg-ulb/creditcardfraud -p ../input/
!cd ../input/; unzip creditcardfraud.zip -d creditcardfraud
import pandas as pd
from autogluon import TabularPrediction as task
data = pd.read_csv('../input/creditcardfraud/creditcard.csv')
model = task.fit(train_data=task.Dataset(data), label='Class')
The normal time to train a classifier is about 5sec, while the NeuralNetClassifier
needs >1h (didn’t finish). The only specialty I feel is that it’s a unbalanced dataset, with ~0.1% positive examples. Maybe negative sampling is used here?
An easy way to fix it is using time_limits
. It would be good if we can hint to use it for a beginner users. E.g. when we detected a classifier used way more time than others, e.g. 10 * avg_time
and time_limits
isn’t set, we could print a message saying that you can use time_limits
, or instructions to filter out this classifier. Also it could be great if we can show the training progress of a classifier it needs a long time to train.
No output_directory specified. Models will be saved in: AutogluonModels/ag-20200730_190141/
Beginning AutoGluon training ...
AutoGluon will save models to AutogluonModels/ag-20200730_190141/
AutoGluon Version: 0.0.12
Train Data Rows: 284807
Train Data Columns: 31
Preprocessing data ...
Here are the 2 unique label values in your data: [0, 1]
AutoGluon infers your prediction problem is: binary (because only two unique label-values observed).
If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping: class 1 = 1, class 0 = 0
Train Data Class Count: 2
Feature Generator processed 284807 data points with 30 features
Original Features (raw dtypes):
float64 features: 30
Original Features (inferred dtypes):
float features: 30
Generated Features (special dtypes):
Final Features (raw dtypes):
float features: 30
Final Features:
float features: 30
Data preprocessing and feature engineering runtime = 1.68s ...
AutoGluon will gauge predictive performance using evaluation metric: accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: accuracy
Fitting model: RandomForestClassifierGini ...
0.9996 = Validation accuracy score
50.67s = Training runtime
0.12s = Validation runtime
Fitting model: RandomForestClassifierEntr ...
0.9996 = Validation accuracy score
37.62s = Training runtime
0.11s = Validation runtime
Fitting model: ExtraTreesClassifierGini ...
0.9996 = Validation accuracy score
7.7s = Training runtime
0.12s = Validation runtime
Fitting model: ExtraTreesClassifierEntr ...
0.9996 = Validation accuracy score
7.0s = Training runtime
0.11s = Validation runtime
Fitting model: KNeighborsClassifierUnif ...
0.9982 = Validation accuracy score
2.71s = Training runtime
0.12s = Validation runtime
Fitting model: KNeighborsClassifierDist ...
0.9986 = Validation accuracy score
2.71s = Training runtime
0.12s = Validation runtime
Fitting model: LightGBMClassifier ...
0.9996 = Validation accuracy score
1.1s = Training runtime
0.01s = Validation runtime
Fitting model: CatboostClassifier ...
0.9996 = Validation accuracy score
2.76s = Training runtime
0.01s = Validation runtime
Fitting model: NeuralNetClassifier ...
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (4 by maintainers)
Hi @Innixma,
The neural net in Autogluon version 0.0.11 is really slow. I did a performance check with some of our datasets (with around 1200 features in each dataset) and more than 95% of the runtime was spend in the neural net of AutoGluon.
The total runtime for fitting and external test set prediction for all models took around 10 days on 1 node using 16 cores. The Autogluon models were trained using
autostack=true
in the fit-function. Autogluon Tabular models are compared to sklearn’s ExtraTrees and MultitaskGNN models in the two tables below.The overall classification model performance of Autogluon Tabular is really impressive, whereas the regression model performance can be improved. I was expecting some higher performance boosts in the regression models due to model stacking.
Autogluon would be an awesome tool with a better performance on regression tasks.
Classification model results:
| Metrics | MCC | MCC | MCC | ROC AUC | ROC AUC | ROC AUC – | – | – | – | – | – | – | – label | NumSample | ExtraTrees | MultitaskGNN | AutoGluon | ExtraTrees | MultitaskGNN | AutoGluon model1 | 68300 | 0.639 | 0.539 | 0.666 | 0.954 | 0.953 | 0.961 model2 | 703 | 0.660 | 0.679 | 0.720 | 0.920 | 0.921 | 0.925 model3 | 1763 | 0.669 | 0.602 | 0.616 | 0.867 | 0.880 | 0.873 model4 | 1446 | 0.696 | 0.631 | 0.719 | 0.910 | 0.886 | 0.911
Regression model results:
| Metrics | R2 | R2 | R2 | RMSE | RMSE | RMSE – | – | – | – | – | – | – | – label | NumSamples | ExtraTrees | MultitaskGNN | AutoGluon | ExtraTrees | MultitaskGNN | AutoGluon model1 | 8151 | 0.523 | 0.594 | 0.540 | 0.361 | 0.333 | 0.354 model2 | 6062 | 0.526 | 0.628 | 0.558 | 0.416 | 0.368 | 0.402 model3 | 9925 | 0.530 | 0.600 | 0.556 | 0.401 | 0.370 | 0.390 model4 | 5511 | 0.591 | 0.583 | 0.625 | 0.215 | 0.217 | 0.206 model5 | 7007 | 0.797 | 0.779 | 0.825 | 0.225 | 0.235 | 0.209 model6 | 5886 | 0.745 | 0.789 | 0.780 | 0.400 | 0.364 | 0.371 model7 | 8880 | 0.752 | 0.806 | 0.785 | 0.399 | 0.353 | 0.372 model8 | 8252 | 0.764 | 0.817 | 0.798 | 0.372 | 0.328 | 0.345 model9 | 94521 | 0.880 | 0.916 | 0.904 | 0.470 | 0.393 | 0.419 model10 | 79889 | 0.614 | 0.659 | 0.675 | 0.320 | 0.301 | 0.293 model11 | 51072 | 0.640 | 0.703 | 0.699 | 0.380 | 0.345 | 0.348 model12 | 44445 | 0.664 | 0.728 | 0.727 | 0.387 | 0.348 | 0.349 model13 | 91612 | 0.759 | 0.804 | 0.791 | 0.570 | 0.514 | 0.531 model14 | 22772 | 0.737 | 0.793 | 0.767 | 0.613 | 0.544 | 0.577 model15 | 23590 | 0.513 | 0.571 | 0.552 | 0.369 | 0.347 | 0.354 model16 | 23474 | 0.489 | 0.553 | 0.536 | 0.394 | 0.369 | 0.376 model17 | 23306 | 0.506 | 0.557 | 0.545 | 0.363 | 0.344 | 0.349
This issue is resolved by #598, although NN can still take a very long time to train in general compared to other models.