Memory error in official kaggle tutorial
See original GitHub issueTabularPredictor().fit
from this tutorial doesn’t work in Colab
or in Kaggle
because of the memory error.
Kaggle has 16.81 GB of available RAM. Autogluon log shows that it needs much less memory: Train Data (Original) Memory Usage: 2715.97 MB
What is the problem?
Issue Analytics
- State:
- Created 2 years ago
- Comments:11
Top Results From Across the Web
Memory error in official kaggle tutorial · Issue #1051 - GitHub
It is likely that there is too little memory, 2.7 GB dataset is very large for 15 GB memory, and depending on how...
Read more >Tutorial on reading large datasets - Kaggle
read_csv will result in an out-of-memory error on Kaggle Notebooks. It has over 100 million rows and 10 columns. Different packages have their...
Read more >RDD Programming Guide - Spark 3.3.1 Documentation
The first line defines a base RDD from an external file. This dataset is not loaded in memory or otherwise acted on: lines...
Read more >Load - Hugging Face
This guide will show you how to load a dataset from: The Hub without a dataset loading script; Local loading script; Local files;...
Read more >How do I load the CelebA dataset on Google Colab, using ...
I did not manage to find a solution to the memory problem. However, I came up with a workaround, custom dataset. Here is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You can do this. First run this:
Then restart the runtime while keeping the environment:
After that import AutoGluon without any error. But the second import shows a warning:
You also can restart the runtime while keeping the environment from code:
Unfortunately, to fit the preprocessing stage features, we need all the data present. There is no ‘skipping’ it.
However, you can fit the preprocessing prior to calling fit (useful for isolating the memory error):
This example shows how feature generators work: https://github.com/awslabs/autogluon/blob/master/examples/tabular/example_custom_feature_generator.py
To get an identical generator to the one used in
fit
by default:Then to fit autogluon with the transformed data and avoid performing any preprocessing during fit, replace the default feature generator with a no-op generator (Identity):