Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`NeuralNetClassifier` took too long to train

See original GitHub issue

sometimes the NeuralNetClassifier took way long time than others. for example, here is an example to train on a small scale dataset

!kaggle d download mlg-ulb/creditcardfraud -p ../input/
!cd ../input/; unzip creditcardfraud.zip -d creditcardfraud

import pandas as pd 
from autogluon import TabularPrediction as task

data = pd.read_csv('../input/creditcardfraud/creditcard.csv')
model = task.fit(train_data=task.Dataset(data), label='Class')

The normal time to train a classifier is about 5sec, while the NeuralNetClassifier needs >1h (didn’t finish). The only specialty I feel is that it’s a unbalanced dataset, with ~0.1% positive examples. Maybe negative sampling is used here?

An easy way to fix it is using time_limits. It would be good if we can hint to use it for a beginner users. E.g. when we detected a classifier used way more time than others, e.g. 10 * avg_time and time_limits isn’t set, we could print a message saying that you can use time_limits, or instructions to filter out this classifier. Also it could be great if we can show the training progress of a classifier it needs a long time to train.

No output_directory specified. Models will be saved in: AutogluonModels/ag-20200730_190141/
Beginning AutoGluon training ...
AutoGluon will save models to AutogluonModels/ag-20200730_190141/
AutoGluon Version:  0.0.12
Train Data Rows:    284807
Train Data Columns: 31
Preprocessing data ...
Here are the 2 unique label values in your data:  [0, 1]
AutoGluon infers your prediction problem is: binary  (because only two unique label-values observed).
If this is wrong, please specify `problem_type` argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Train Data Class Count: 2
Feature Generator processed 284807 data points with 30 features
Original Features (raw dtypes):
	float64 features: 30
Original Features (inferred dtypes):
	float features: 30
Generated Features (special dtypes):
Final Features (raw dtypes):
	float features: 30
Final Features:
	float features: 30
	Data preprocessing and feature engineering runtime = 1.68s ...
AutoGluon will gauge predictive performance using evaluation metric: accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: accuracy
Fitting model: RandomForestClassifierGini ...
	0.9996	 = Validation accuracy score
	50.67s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: RandomForestClassifierEntr ...
	0.9996	 = Validation accuracy score
	37.62s	 = Training runtime
	0.11s	 = Validation runtime
Fitting model: ExtraTreesClassifierGini ...
	0.9996	 = Validation accuracy score
	7.7s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: ExtraTreesClassifierEntr ...
	0.9996	 = Validation accuracy score
	7.0s	 = Training runtime
	0.11s	 = Validation runtime
Fitting model: KNeighborsClassifierUnif ...
	0.9982	 = Validation accuracy score
	2.71s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: KNeighborsClassifierDist ...
	0.9986	 = Validation accuracy score
	2.71s	 = Training runtime
	0.12s	 = Validation runtime
Fitting model: LightGBMClassifier ...
	0.9996	 = Validation accuracy score
	1.1s	 = Training runtime
	0.01s	 = Validation runtime
Fitting model: CatboostClassifier ...
	0.9996	 = Validation accuracy score
	2.76s	 = Training runtime
	0.01s	 = Validation runtime
Fitting model: NeuralNetClassifier ...

Issue Analytics

State:
Created 3 years ago
Comments:12 (4 by maintainers)

Top GitHub Comments

1reaction

schintocommented, Jul 31, 2020

Hi @Innixma,

The neural net in Autogluon version 0.0.11 is really slow. I did a performance check with some of our datasets (with around 1200 features in each dataset) and more than 95% of the runtime was spend in the neural net of AutoGluon.

The total runtime for fitting and external test set prediction for all models took around 10 days on 1 node using 16 cores. The Autogluon models were trained using autostack=true in the fit-function. Autogluon Tabular models are compared to sklearn’s ExtraTrees and MultitaskGNN models in the two tables below.

The overall classification model performance of Autogluon Tabular is really impressive, whereas the regression model performance can be improved. I was expecting some higher performance boosts in the regression models due to model stacking.

Autogluon would be an awesome tool with a better performance on regression tasks.

Classification model results:

| Metrics | MCC | MCC | MCC | ROC AUC | ROC AUC | ROC AUC – | – | – | – | – | – | – | – label | NumSample | ExtraTrees | MultitaskGNN | AutoGluon | ExtraTrees | MultitaskGNN | AutoGluon model1 | 68300 | 0.639 | 0.539 | 0.666 | 0.954 | 0.953 | 0.961 model2 | 703 | 0.660 | 0.679 | 0.720 | 0.920 | 0.921 | 0.925 model3 | 1763 | 0.669 | 0.602 | 0.616 | 0.867 | 0.880 | 0.873 model4 | 1446 | 0.696 | 0.631 | 0.719 | 0.910 | 0.886 | 0.911

Regression model results:

| Metrics | R2 | R2 | R2 | RMSE | RMSE | RMSE – | – | – | – | – | – | – | – label | NumSamples | ExtraTrees | MultitaskGNN | AutoGluon | ExtraTrees | MultitaskGNN | AutoGluon model1 | 8151 | 0.523 | 0.594 | 0.540 | 0.361 | 0.333 | 0.354 model2 | 6062 | 0.526 | 0.628 | 0.558 | 0.416 | 0.368 | 0.402 model3 | 9925 | 0.530 | 0.600 | 0.556 | 0.401 | 0.370 | 0.390 model4 | 5511 | 0.591 | 0.583 | 0.625 | 0.215 | 0.217 | 0.206 model5 | 7007 | 0.797 | 0.779 | 0.825 | 0.225 | 0.235 | 0.209 model6 | 5886 | 0.745 | 0.789 | 0.780 | 0.400 | 0.364 | 0.371 model7 | 8880 | 0.752 | 0.806 | 0.785 | 0.399 | 0.353 | 0.372 model8 | 8252 | 0.764 | 0.817 | 0.798 | 0.372 | 0.328 | 0.345 model9 | 94521 | 0.880 | 0.916 | 0.904 | 0.470 | 0.393 | 0.419 model10 | 79889 | 0.614 | 0.659 | 0.675 | 0.320 | 0.301 | 0.293 model11 | 51072 | 0.640 | 0.703 | 0.699 | 0.380 | 0.345 | 0.348 model12 | 44445 | 0.664 | 0.728 | 0.727 | 0.387 | 0.348 | 0.349 model13 | 91612 | 0.759 | 0.804 | 0.791 | 0.570 | 0.514 | 0.531 model14 | 22772 | 0.737 | 0.793 | 0.767 | 0.613 | 0.544 | 0.577 model15 | 23590 | 0.513 | 0.571 | 0.552 | 0.369 | 0.347 | 0.354 model16 | 23474 | 0.489 | 0.553 | 0.536 | 0.394 | 0.369 | 0.376 model17 | 23306 | 0.506 | 0.557 | 0.545 | 0.363 | 0.344 | 0.349

0reactions

Innixmacommented, Feb 25, 2021

This issue is resolved by #598, although NN can still take a very long time to train in general compared to other models.

Top Results From Across the Web

`NeuralNetClassifier` took too long to train · Issue #595 - GitHub

sometimes the NeuralNetClassifier took way long time than others. for example, here is an example to train on a small scale dataset.

FAQ — skorch 0.12.1 documentation - Read the Docs

How do I shuffle my train batches?¶. skorch uses DataLoader from PyTorch under the hood. This class takes a couple of arguments, for...

Skorch: How to plot training and validation Accuracy

Can 777-characters long passphrase be considered too short? Should I remove the inner parts of marrow vegetables? How to use a for loop...

A Gentle Introduction to Early Stopping to Avoid Overtraining ...

A major challenge in training neural networks is how long to train them. Too little training will mean that the model will underfit...

Deep Learning and Applications Homework 1 - GitHub Pages

For this question, we use the default validation split generated by Skorch. train = torchvision.datasets.CIFAR10("./data", train=True, download=True) test = ...