Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory error in official kaggle tutorial

See original GitHub issue

TabularPredictor().fit from this tutorial doesn’t work in Colab or in Kaggle because of the memory error. Kaggle has 16.81 GB of available RAM. Autogluon log shows that it needs much less memory: Train Data (Original) Memory Usage: 2715.97 MB What is the problem?

Issue Analytics

State:
Created 2 years ago
Comments:11

Top GitHub Comments

1reaction

qo4oncommented, Apr 11, 2021

Kaggle doesn’t offer an option to restart the runtime while keeping the environment

You can do this. First run this:

Then restart the runtime while keeping the environment:

After that import AutoGluon without any error. But the second import shows a warning:

You also can restart the runtime while keeping the environment from code:

import os
os.kill(os.getpid(), 9)

0reactions

Innixmacommented, Apr 14, 2021

Can we get features after preprocessing step? They are all numerical, and they suppose to take less memory. If we do preprocessing in a separate process when the data is loading and then skip preprocessing step in fit(), this will help save memory.

Unfortunately, to fit the preprocessing stage features, we need all the data present. There is no ‘skipping’ it.

However, you can fit the preprocessing prior to calling fit (useful for isolating the memory error):

This example shows how feature generators work: https://github.com/awslabs/autogluon/blob/master/examples/tabular/example_custom_feature_generator.py

To get an identical generator to the one used in fit by default:

from autogluon.features.generators import AutoMLPipelineFeatureGenerator
feature_generator = AutoMLPipelineFeatureGenerator()
train_data_transformed = feature_generator.fit_transform(X=train_data.drop(columns=[LABEL]))
train_data_transformed[LABEL] = train_data[LABEL]
test_data_transformed = feature_generator.transform(test_data)
test_data_transformed[LABEL] = test_data[LABEL]

Then to fit autogluon with the transformed data and avoid performing any preprocessing during fit, replace the default feature generator with a no-op generator (Identity):

from autogluon.features.generators import IdentityFeatureGenerator
predictor = TabularPredictor(
    label=LABEL,
    verbosity=2,
).fit(
    train_data=train_data_transformed,
    feature_generator=IdentityFeatureGenerator(),
    time_limit=60,
)

leaderboard = predictor.leaderboard(test_data_transformed)