Auto-sklearn consumes a lot of memory compared to dataset size
See original GitHub issueDescribe the bug
By default memory_limit is set to 3GB for machine learning models. I do get following error even when I use data of size ~3.5MB.
ValueError: Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 1024 MB).', 'configuration_origin': 'DUMMY'}
As per subsample_if_too_large, 10x memory should be sufficient to successfully train best model. But I have to increase memory_limit to 4-5GB to successfully run experiments.
We have performed few experiments to find root cause. It looks like it is related to pynisher and some memory is used by python interpreter.
I do wonder why is such large memory is required for such small dataset.
To Reproduce
I am using following script. I am getting same error even with n=1,2,3 and 4 i.e. (memory_limit=1-4GB).
import sklearn.datasets
import pandas as pd
import autosklearn.classification
X, y = sklearn.datasets.fetch_openml(data_id=1461, return_X_y=True, as_frame=False)
for col in X.columns:
if X[col].dtype.name == 'object':
X[col] = X[col].astype('category')
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
print('data shape:', X_train.shape)
n=1
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=300,
per_run_time_limit=60,
tmp_folder='/tmp/autosklearn_classification_example_tmp',
output_folder='/tmp/autosklearn_classification_example_out',
memory_limit=n*1024,
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')
Actual behavior, stacktrace or logfile
data shape: (33908, 16)
Traceback (most recent call last):
File "test_memory_askl.py", line 31, in <module>
[ERROR] [2021-04-18 00:19:13,203:Client-AutoML(1):breast_cancer] Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 1024 MB).', 'configuration_origin': 'DUMMY'}.
automl.fit(X_train, y_train, dataset_name='breast_cancer')
File "automl_env/lib/python3.7/site-packages/autosklearn/estimators.py", line 598, in fit
dataset_name=dataset_name,
File "automl_env/lib/python3.7/site-packages/autosklearn/estimators.py", line 357, in fit
self.automl_.fit(load_models=self.load_models, **kwargs)
File "automl_env/lib/python3.7/site-packages/autosklearn/automl.py", line 1422, in fit
is_classification=True,
File "automl_env/lib/python3.7/site-packages/autosklearn/automl.py", line 623, in fit
self._do_dummy_prediction(datamanager, num_run)
File "automl_env/lib/python3.7/site-packages/autosklearn/automl.py", line 438, in _do_dummy_prediction
% (str(status), str(additional_info))
ValueError: Dummy prediction failed with run state StatusType.MEMOUT and additional output: {'error': 'Memout (used more than 1024 MB).', 'configuration_origin': 'DUMMY'}.
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/opt/python/python37/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
Process ForkProcess-1:
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Exception ignored in: <function AutoML.__del__ at 0x7ff16baa0840>
Traceback (most recent call last):
File "automl_env/lib/python3.7/site-packages/autosklearn/automl.py", line 1373, in __del__
File "automl_env/lib/python3.7/site-packages/autosklearn/automl.py", line 352, in _clean_logger
File "/opt/python/python37/lib/python3.7/multiprocessing/process.py", line 140, in join
File "/opt/python/python37/lib/python3.7/multiprocessing/popen_fork.py", line 44, in wait
TypeError: 'NoneType' object is not callable
Traceback (most recent call last):
File "/opt/python/python37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/opt/python/python37/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "automl_env/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 295, in start_log_server
receiver.serve_until_stopped()
File "automl_env/lib/python3.7/site-packages/autosklearn/util/logging_.py", line 327, in serve_until_stopped
self.timeout)
KeyboardInterrupt
Environment and installation:
- Is your installation in a virtual environment or conda environment? Virtual Environment
- Python Version: 3.7.3
- Auto-skearn: 0.12.3
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Sure. I will try with autosklearn docker container and keep you posted once I have results.
This issue has been automatically closed due to inactivity.