question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error running "fit" with many cores.

See original GitHub issue

Hi! I’m experiencing a problem when I fit an AutoSklearn instance in a virtual machine with many cores.

I have run exactly the same code, with the same dataset in three different virtual machines:

in a vm with 4 cores and 15Gb of RAM: works ok ✅ in a vm with 8 cores and 30Gb of RAM: works ok ✅ in a vm with 40 cores and 157 Gb of RAM: fails ❌ with the following error:

ValueError: Dummy prediction failed with run state StatusType.CRASHED and additional output: {'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': 'Process pynisher function call:\nTraceback (most recent call last):\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap\n self.run()\n File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run\n self._target(*self._args, **self._kwargs)\n File "/usr/local/lib/python3.7/site-packages/pynisher/limit_function_call.py", line 133, in subprocess_func\n return_value = ((func(*args, **kwargs), 0))\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/__init__.py", line 40, in fit_predict_try_except_decorator\n return ta(queue=queue, **kwargs)\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 1164, in eval_holdout\n budget_type=budget_type,\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/train_evaluator.py", line 194, in __init__\n budget_type=budget_type,\n File "/usr/local/lib/python3.7/site-packages/autosklearn/evaluation/abstract_evaluator.py", line 199, in __init__\n threadpool_limits(limits=1)\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 171, in __init__\n self._original_info = self._set_threadpool_limits()\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 280, in _set_threadpool_limits\n module.set_num_threads(num_threads)\n File "/usr/local/lib/python3.7/site-packages/threadpoolctl.py", line 659, in set_num_threads\n return set_func(num_threads)\nKeyboardInterrupt\n', 'exitcode': 1, 'configuration_origin': 'DUMMY'}.

This is the code I was running:

automl = AutoSklearnClassifier(time_left_for_this_task=600, metric=roc_auc)
automl.fit(x_train, y_train, x_validation, y_validation)

Limiting the number of cores with the param nproc seems to work, but it’s a pity that we cannot take advantage of larger infra 😦

The dataset doesn’t seem to be the problem. I reproduced the bug with datasets of different sizes and different feature types, and everytime it raises the same error (it’s not something that happens stochastically).

Also, the error is almost instantaneous: clearly it doesn’t even start to fit when it fails.

Environment and installation:

  • OS: linux
  • Python version: 3.7
  • Auto-sklearn version: 0.13.0

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:11
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
sofidennercommented, Sep 27, 2021

The workaround I found to fix this issue is to limit the number of cores with the env var OPENBLAS_NUM_THREADS before importing anything from autosklearn.

For example:

import os

os.environ["OPENBLAS_NUM_THREADS"] = "8"

from autosklearn(...)

3reactions
raphaelTrenchcommented, Jan 6, 2022

I have been getting this error as well on macOS Monterey 12.0 and auto-sklearn==0.13.0, and I have not updated any libraries in my environment before this error started showing up. It happens when calling fit regardless of parameters:

File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/c91195a/Documents/experian/dragon/dragon/console.py", line 504, in train_console
    train(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/Users/c91195a/Documents/experian/dragon/dragon/train.py", line 436, in train
    experiment.run()
  File "/Users/c91195a/Documents/experian/dragon/dragon/experiment/experiment.py", line 180, in run
    self.__fit()
  File "/Users/c91195a/Documents/experian/dragon/dragon/experiment/experiment.py", line 52, in __fit
    self.ml_estimator.fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/sklearn/pipeline.py", line 346, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/experimental/askl2.py", line 425, in fit
    return super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/estimators.py", line 941, in fit
    super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/estimators.py", line 340, in fit
    self.automl_.fit(load_models=self.load_models, **kwargs)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 1655, in fit
    return super().fit(
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 642, in fit
    self.num_run += self._do_dummy_prediction(datamanager, num_run=1)
  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/autosklearn/automl.py", line 422, in _do_dummy_prediction
    raise ValueError(
ValueError: Dummy prediction failed with run state StatusType.CRASHED and additional output: {'error': 'Result queue is empty', 'exit_status': "<class 'pynisher.limit_function_call.AnythingException'>", 'subprocess_stdout': '', 'subprocess_stderr': 'Process pynisher function call:\nTraceback (most recent call last):\n  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap\n    self.run()\n  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run\n    self._target(*self._args, **self._kwargs)\n  File "/Users/c91195a/Library/Caches/pypoetry/virtualenvs/dragon-oQHUJD0o-py3.8/lib/python3.8/site-packages/pynisher/limit_function_call.py", line 108, in subprocess_func\n    resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\nValueError: current limit exceeds maximum limit\n', 'exitcode': 1, 'configuration_origin': 'DUMMY'}.
  In call to configurable 'train' (<function train at 0x7f9cbcc5f8b0>)
`
``
Read more comments on GitHub >

github_iconTop Results From Across the Web

mclapply encounters errors depending on core id?
1 Answer 1 ... "the input X is split into as many parts as there are cores (currently the values are spread across...
Read more >
The Windows and Multithreading Problem (A Must Read)
In the diagram above, we can see where it says Windows 10 Home is limited to 64 cores (threads), whereas Pro/Education versions go...
Read more >
R: Multi-core Processing
In secr.fit the output component 'proctime' misrepresents the elapsed processing time when multiple cores are used. Warning. It appears that multicore ...
Read more >
Nodes Offline: Why Your SQL Server VM Can't Use All Its ...
How to tell if you're having this problem – and fix it. Just run sp_Blitz, and look for the warning of “CPU Cores...
Read more >
Running brms models with within-chain parallelization
Avoid using hyper-threading, that is, only use as many threads as you have physical cores available. Ensure that the data is randomly sorted ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found