question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] RayTaskError(RayOutOfMemoryError) although there's plenty of free SWAP left

See original GitHub issue

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core, Ray Clusters

What happened + What you expected to happen

Ray throws an exception when node’s RAM is full. Would expect to continue using available SWAP and finish the tasks. Also setting os.environ["RAY_DISABLE_MEMORY_MONITOR"] = "1" (before importing ray) does nothing.

Versions / Dependencies

name: puma-lab channels:

  • pyviz
  • conda-forge
  • defaults dependencies:
  • _libgcc_mutex=0.1=conda_forge
  • _openmp_mutex=4.5=1_gnu
  • abseil-cpp=20210324.2=h9c3ff4c_0
  • alsa-lib=1.2.3=h516909a_0
  • anyio=3.4.0=py37h89c1867_0
  • aplus=0.11.0=py_1
  • appdirs=1.4.4=pyh9f0ad1d_0
  • argcomplete=1.12.3=pyhd8ed1ab_2
  • argon2-cffi=21.1.0=py37h5e8e339_2
  • arrow-cpp=6.0.1=py37h815fc2d_3_cpu
  • astropy=4.3.1=py37hb1e94ed_2
  • async_generator=1.10=py_0
  • attrs=21.2.0=pyhd8ed1ab_0
  • aws-c-auth=0.6.8=hfef2836_0
  • aws-c-cal=0.5.12=h70efedd_7
  • aws-c-common=0.6.17=h7f98852_0
  • aws-c-compression=0.2.14=h7c7754b_7
  • aws-c-event-stream=0.2.7=hb80ed28_31
  • aws-c-http=0.6.10=h58a30cf_2
  • aws-c-io=0.10.13=he836878_5
  • aws-c-mqtt=0.7.9=h042a236_0
  • aws-c-s3=0.1.27=h4f4cd48_12
  • aws-c-sdkutils=0.1.1=h7c7754b_4
  • aws-checksums=0.1.12=h7c7754b_6
  • aws-crt-cpp=0.17.9=hc7d31a4_1
  • aws-sdk-cpp=1.9.154=h77f1c7e_0
  • babel=2.9.1=pyh44b312d_0
  • backcall=0.2.0=pyh9f0ad1d_0
  • backports=1.0=py_2
  • backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  • backports.zoneinfo=0.2.1=py37h5e8e339_4
  • blake3=0.2.1=py37hfd0a3e1_0
  • bleach=4.1.0=pyhd8ed1ab_0
  • blosc=1.21.0=h9c3ff4c_0
  • bokeh=2.4.2=py37h89c1867_0
  • bqplot=0.12.31=pyhd8ed1ab_0
  • branca=0.4.2=pyhd8ed1ab_0
  • brotli=1.0.9=h7f98852_6
  • brotli-bin=1.0.9=h7f98852_6
  • brotlipy=0.7.0=py37h5e8e339_1003
  • brunsli=0.1=h9c3ff4c_0
  • bzip2=1.0.8=h7f98852_4
  • c-ares=1.18.1=h7f98852_0
  • c-blosc2=2.0.4=h5f21a17_1
  • ca-certificates=2021.10.8=ha878542_0
  • cached-property=1.5.2=hd8ed1ab_1
  • cached_property=1.5.2=pyha770c72_1
  • cachetools=4.2.4=pyhd8ed1ab_0
  • cffi=1.15.0=py37h036bc23_0
  • cfitsio=4.0.0=h9a35b8e_0
  • charls=2.2.0=h9c3ff4c_0
  • click=8.0.3=py37h89c1867_1
  • clickhouse-cityhash=1.0.2.3=py37hcd2ae1e_3
  • clickhouse-driver=0.2.2=py37h5e8e339_1
  • cloudpickle=2.0.0=pyhd8ed1ab_0
  • colorama=0.4.4=pyh9f0ad1d_0
  • colorcet=3.0.0=pyhd8ed1ab_0
  • cramjam=2.3.1=py37h5e8e339_1
  • cryptography=36.0.0=py37hf1a17b8_0
  • cycler=0.11.0=pyhd8ed1ab_0
  • cytoolz=0.11.2=py37h5e8e339_1
  • dask=2021.11.2=pyhd8ed1ab_0
  • dask-core=2021.11.2=pyhd8ed1ab_0
  • datashader=0.13.0=pyh6c4a22f_0
  • datashape=0.5.4=py_1
  • dbus=1.13.6=h48d8840_2
  • debugpy=1.5.1=py37hcd2ae1e_0
  • decorator=5.1.0=pyhd8ed1ab_0
  • defusedxml=0.7.1=pyhd8ed1ab_0
  • distributed=2021.11.2=py37h89c1867_0
  • entrypoints=0.3=py37hc8dfbb8_1002
  • expat=2.4.1=h9c3ff4c_0
  • fastapi=0.70.0=pyhd8ed1ab_0
  • fastparquet=0.7.2=py37hb1e94ed_0
  • filelock=3.4.0=pyhd8ed1ab_0
  • fontconfig=2.13.1=hba837de_1005
  • fonttools=4.28.3=py37h5e8e339_0
  • freetype=2.10.4=h0708190_1
  • frozendict=2.0.3=pyhd8ed1ab_0
  • fsspec=2021.11.1=pyhd8ed1ab_0
  • future=0.18.2=py37h89c1867_4
  • geos=3.10.1=h9c3ff4c_1
  • gettext=0.19.8.1=h73d1719_1008
  • gflags=2.2.2=he1b5a44_1004
  • giflib=5.2.1=h36c2ea0_2
  • gitdb=4.0.9=pyhd8ed1ab_0
  • gitpython=3.1.24=pyhd8ed1ab_0
  • glib=2.70.1=h780b84a_0
  • glib-tools=2.70.1=h780b84a_0
  • glog=0.5.0=h48cff8f_0
  • grpc-cpp=1.42.0=h7e358d9_0
  • gst-plugins-base=1.18.5=hf529b03_2
  • gstreamer=1.18.5=h9f60fe5_2
  • h5py=3.6.0=nompi_py37hd308b1e_100
  • hdf5=1.12.1=nompi_h2750804_103
  • heapdict=1.0.1=py_0
  • holoviews=1.14.6=py_0
  • hvplot=0.7.3=py_0
  • icu=68.2=h9c3ff4c_0
  • imagecodecs=2021.11.20=py37h4167934_1
  • imageio=2.13.1=pyhd8ed1ab_1
  • importlib-metadata=4.8.2=py37h89c1867_0
  • importlib_metadata=4.8.2=hd8ed1ab_0
  • importlib_resources=5.4.0=pyhd8ed1ab_0
  • ipydatawidgets=4.2.0=pyhd3deb0d_0
  • ipykernel=6.6.0=py37h6531663_0
  • ipyleaflet=0.15.0=pyhd8ed1ab_0
  • ipympl=0.8.2=pyhd8ed1ab_0
  • ipython=7.30.1=py37h89c1867_0
  • ipython_genutils=0.2.0=py_1
  • ipyvolume=0.6.0a8=pyhd8ed1ab_0
  • ipyvue=1.7.0=pyhd8ed1ab_0
  • ipyvuetify=1.8.1=pyhd8ed1ab_0
  • ipywebrtc=0.6.0=pyhd8ed1ab_0
  • ipywidgets=7.6.5=pyhd8ed1ab_0
  • jbig=2.1=h7f98852_2003
  • jedi=0.18.1=py37h89c1867_0
  • jinja2=3.0.3=pyhd8ed1ab_0
  • jpeg=9d=h36c2ea0_0
  • json5=0.9.5=pyh9f0ad1d_0
  • jsonschema=4.2.1=pyhd8ed1ab_0
  • jupyter-server-mathjax=0.2.3=pyhd8ed1ab_0
  • jupyter_client=7.1.0=pyhd8ed1ab_0
  • jupyter_contrib_core=0.3.3=py_2
  • jupyter_contrib_nbextensions=0.5.1=py37hc8dfbb8_1
  • jupyter_core=4.9.1=py37h89c1867_1
  • jupyter_highlight_selected_word=0.2.0=py37h89c1867_1005
  • jupyter_latex_envs=1.4.6=py37h89c1867_1001
  • jupyter_nbextensions_configurator=0.4.1=py37h89c1867_2
  • jupyter_server=1.12.1=pyhd8ed1ab_0
  • jupyterlab=3.2.4=pyhd8ed1ab_0
  • jupyterlab-git=0.34.0=pyhd8ed1ab_0
  • jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  • jupyterlab_server=2.8.2=pyhd8ed1ab_0
  • jupyterlab_widgets=1.0.2=pyhd8ed1ab_0
  • jxrlib=1.1=h7f98852_2
  • kiwisolver=1.3.2=py37h2527ec5_1
  • krb5=1.19.2=hcc1bbae_3
  • lcms2=2.12=hddcbb42_0
  • ld_impl_linux-64=2.36.1=hea4e1c9_2
  • lerc=3.0=h9c3ff4c_0
  • libaec=1.0.6=h9c3ff4c_0
  • libblas=3.9.0=12_linux64_openblas
  • libbrotlicommon=1.0.9=h7f98852_6
  • libbrotlidec=1.0.9=h7f98852_6
  • libbrotlienc=1.0.9=h7f98852_6
  • libcblas=3.9.0=12_linux64_openblas
  • libclang=11.1.0=default_ha53f305_1
  • libcurl=7.80.0=h2574ce0_0
  • libdeflate=1.8=h7f98852_0
  • libedit=3.1.20191231=he28a2e2_2
  • libev=4.33=h516909a_1
  • libevent=2.1.10=h9b69904_4
  • libffi=3.4.2=h7f98852_5
  • libgcc-ng=11.2.0=h1d223b6_11
  • libgfortran-ng=11.2.0=h69a702a_11
  • libgfortran5=11.2.0=h5c6108e_11
  • libglib=2.70.1=h174f98d_0
  • libgomp=11.2.0=h1d223b6_11
  • libiconv=1.16=h516909a_0
  • liblapack=3.9.0=12_linux64_openblas
  • libllvm10=10.0.1=he513fc3_3
  • libllvm11=11.1.0=hf817b99_2
  • libnghttp2=1.43.0=h812cca2_1
  • libnsl=2.0.0=h7f98852_0
  • libogg=1.3.4=h7f98852_1
  • libopenblas=0.3.18=pthreads_h8fe5266_0
  • libopus=1.3.1=h7f98852_1
  • libpng=1.6.37=h21135ba_2
  • libpq=13.5=hd57d9b9_0
  • libprotobuf=3.18.1=h780b84a_0
  • libsodium=1.0.18=h36c2ea0_1
  • libssh2=1.10.0=ha56f1ee_2
  • libstdcxx-ng=11.2.0=he4da1e4_11
  • libthrift=0.15.0=he6d91bd_1
  • libtiff=4.3.0=h6f004c6_2
  • libutf8proc=2.6.1=h7f98852_0
  • libuuid=2.32.1=h7f98852_1000
  • libvorbis=1.3.7=h9c3ff4c_0
  • libwebp-base=1.2.1=h7f98852_0
  • libxcb=1.13=h7f98852_1004
  • libxkbcommon=1.0.3=he3ba5ed_0
  • libxml2=2.9.12=h72842e0_0
  • libxslt=1.1.33=h15afd5d_2
  • libzlib=1.2.11=h36c2ea0_1013
  • libzopfli=1.0.3=h9c3ff4c_0
  • llvmlite=0.36.0=py37h9d7f4d0_0
  • locket=0.2.0=py_2
  • lxml=4.6.4=py37h77fd288_0
  • lz4-c=1.9.3=h9c3ff4c_1
  • markdown=3.3.6=pyhd8ed1ab_0
  • markupsafe=2.0.1=py37h5e8e339_1
  • matplotlib=3.5.0=py37h89c1867_0
  • matplotlib-base=3.5.0=py37h1058ff1_0
  • matplotlib-inline=0.1.3=pyhd8ed1ab_0
  • mistune=0.8.4=py37h5e8e339_1005
  • msgpack-python=1.0.3=py37h2527ec5_0
  • multipledispatch=0.6.0=py_0
  • munkres=1.1.4=pyh9f0ad1d_0
  • mysql-common=8.0.27=ha770c72_1
  • mysql-libs=8.0.27=hfa10184_1
  • nb_conda_kernels=2.3.1=py37h89c1867_1
  • nbclassic=0.3.4=pyhd8ed1ab_0
  • nbclient=0.5.9=pyhd8ed1ab_0
  • nbconvert=6.3.0=py37h89c1867_1
  • nbdime=3.1.1=pyhd8ed1ab_0
  • nbformat=5.1.3=pyhd8ed1ab_0
  • ncurses=6.2=h58526e2_4
  • nest-asyncio=1.5.4=pyhd8ed1ab_0
  • networkx=2.6.3=pyhd8ed1ab_1
  • notebook=6.4.6=pyha770c72_0
  • nspr=4.32=h9c3ff4c_1
  • nss=3.73=hb5efdd6_0
  • numba=0.53.1=py37hb11d6e1_1
  • numpy=1.21.4=py37h31617e3_0
  • olefile=0.46=pyh9f0ad1d_1
  • openjpeg=2.4.0=hb52868f_1
  • openssl=1.1.1l=h7f98852_0
  • orc=1.7.1=h68e2c4e_0
  • packaging=21.3=pyhd8ed1ab_0
  • pandas=1.3.4=py37he8f5f7f_1
  • pandoc=2.16.2=h7f98852_0
  • pandocfilters=1.5.0=pyhd8ed1ab_0
  • panel=0.12.5=py_0
  • param=1.12.0=pyh6c4a22f_0
  • parquet-cpp=1.5.1=1
  • parso=0.8.3=pyhd8ed1ab_0
  • partd=1.2.0=pyhd8ed1ab_0
  • pcre=8.45=h9c3ff4c_0
  • pexpect=4.8.0=py37hc8dfbb8_1
  • pickleshare=0.7.5=py37hc8dfbb8_1002
  • pillow=8.4.0=py37h0f21c89_0
  • pip=21.3.1=pyhd8ed1ab_0
  • pooch=1.5.2=pyhd8ed1ab_0
  • progressbar2=3.53.1=pyh9f0ad1d_0
  • prometheus_client=0.12.0=pyhd8ed1ab_0
  • prompt-toolkit=3.0.22=pyha770c72_0
  • psutil=5.8.0=py37h5e8e339_2
  • pthread-stubs=0.4=h36c2ea0_1001
  • ptyprocess=0.7.0=pyhd3deb0d_0
  • pyarrow=6.0.1=py37h20dbb2a_3_cpu
  • pycparser=2.21=pyhd8ed1ab_0
  • pyct=0.4.6=py_0
  • pyct-core=0.4.6=py_0
  • pydantic=1.8.2=py37h5e8e339_2
  • pyerfa=2.0.0.1=py37hb1e94ed_1
  • pygments=2.10.0=pyhd8ed1ab_0
  • pykalman=0.9.5=py_1
  • pyopenssl=21.0.0=pyhd8ed1ab_0
  • pyparsing=3.0.6=pyhd8ed1ab_0
  • pyqt=5.12.3=py37h89c1867_8
  • pyqt-impl=5.12.3=py37hac37412_8
  • pyqt5-sip=4.19.18=py37hcd2ae1e_8
  • pyqtchart=5.12=py37he336c9b_8
  • pyqtwebengine=5.12.1=py37he336c9b_8
  • pyrsistent=0.18.0=py37h5e8e339_0
  • pysocks=1.7.1=py37h89c1867_4
  • python=3.7.12=hb7a2778_100_cpython
  • python-dateutil=2.8.2=pyhd8ed1ab_0
  • python-tzdata=2021.5=pyhd8ed1ab_0
  • python-utils=2.5.6=pyh44b312d_0
  • python_abi=3.7=2_cp37m
  • pythreejs=2.3.0=pyhd8ed1ab_0
  • pytz=2021.3=pyhd8ed1ab_0
  • pytz-deprecation-shim=0.1.0.post0=py37h89c1867_1
  • pyviz_comms=2.1.0=py_0
  • pywavelets=1.2.0=py37hb1e94ed_1
  • pyyaml=6.0=py37h5e8e339_3
  • pyzmq=22.3.0=py37h336d617_1
  • qt=5.12.9=hda022c4_4
  • re2=2021.11.01=h9c3ff4c_0
  • readline=8.1=h46c0cb4_0
  • requests=2.26.0=pyhd8ed1ab_1
  • s2n=1.3.0=h9b69904_0
  • scikit-image=0.18.3=py37he8f5f7f_1
  • scipy=1.7.3=py37hf2a6cf1_0
  • send2trash=1.8.0=pyhd8ed1ab_0
  • setuptools=59.4.0=py37h89c1867_0
  • shapely=1.8.0=py37h9b0f7a3_4
  • six=1.16.0=pyh6c4a22f_0
  • smmap=3.0.5=pyh44b312d_0
  • snappy=1.1.8=he1b5a44_3
  • sniffio=1.2.0=py37h89c1867_2
  • sortedcontainers=2.4.0=pyhd8ed1ab_0
  • sqlite=3.37.0=h9cd32fc_0
  • starlette=0.16.0=pyhd8ed1ab_0
  • tabulate=0.8.9=pyhd8ed1ab_0
  • tblib=1.7.0=pyhd8ed1ab_0
  • terminado=0.12.1=py37h89c1867_1
  • testpath=0.5.0=pyhd8ed1ab_0
  • thrift=0.15.0=py37hcd2ae1e_1
  • tifffile=2021.11.2=pyhd8ed1ab_0
  • tk=8.6.11=h27826a3_1
  • toolz=0.11.2=pyhd8ed1ab_0
  • tornado=6.1=py37h5e8e339_2
  • tqdm=4.62.3=pyhd8ed1ab_0
  • traitlets=5.1.1=pyhd8ed1ab_0
  • traittypes=0.2.1=pyh9f0ad1d_2
  • typing-extensions=4.0.1=hd8ed1ab_0
  • typing_extensions=4.0.1=pyha770c72_0
  • tzdata=2021e=he74cb21_0
  • tzlocal=4.1=py37h89c1867_1
  • unicodedata2=13.0.0.post2=py37h5e8e339_4
  • urllib3=1.26.7=pyhd8ed1ab_0
  • vaex=4.6.0=pyhd8ed1ab_0
  • vaex-astro=0.8.3=pyhd8ed1ab_0
  • vaex-core=4.6.0=py37h092ef5d_0
  • vaex-hdf5=0.11.0=pyhd8ed1ab_0
  • vaex-jupyter=0.6.0=pyhd8ed1ab_0
  • vaex-ml=0.15.0=pyhd8ed1ab_0
  • vaex-server=0.7.0=pyhd8ed1ab_0
  • vaex-viz=0.5.0=pyhd8ed1ab_0
  • wcwidth=0.2.5=pyh9f0ad1d_2
  • webencodings=0.5.1=py_1
  • websocket-client=1.2.1=py37h89c1867_0
  • wheel=0.37.0=pyhd8ed1ab_1
  • widgetsnbextension=3.5.2=py37h89c1867_1
  • xarray=0.20.1=pyhd8ed1ab_0
  • xorg-libxau=1.0.9=h7f98852_0
  • xorg-libxdmcp=1.1.3=h7f98852_0
  • xz=5.2.5=h516909a_1
  • yaml=0.2.5=h516909a_0
  • zeromq=4.3.4=h9c3ff4c_1
  • zfp=0.5.5=h9c3ff4c_8
  • zict=2.0.0=py_0
  • zipp=3.6.0=pyhd8ed1ab_0
  • zlib=1.2.11=h36c2ea0_1013
  • zstandard=0.16.0=py37h5e8e339_2
  • zstd=1.5.0=ha95c52a_0
  • pip:
    • aiohttp==3.8.1
    • aiohttp-cors==0.7.0
    • aioredis==1.3.1
    • aiosignal==1.2.0
    • async-timeout==4.0.1
    • asynctest==0.13.0
    • blessed==1.19.0
    • certifi==2021.10.8
    • charset-normalizer==2.0.9
    • colorful==0.5.4
    • deprecated==1.2.13
    • frozenlist==1.2.0
    • google-api-core==2.2.2
    • google-auth==2.3.3
    • googleapis-common-protos==1.53.0
    • gpustat==1.0.0b1
    • grpcio==1.42.0
    • hiredis==2.0.0
    • idna==3.3
    • multidict==5.2.0
    • nvidia-ml-py3==7.352.0
    • opencensus==0.8.0
    • opencensus-context==0.1.2
    • protobuf==3.19.1
    • py-spy==0.3.11
    • pyasn1==0.4.8
    • pyasn1-modules==0.2.8
    • ray==1.9.0
    • redis==4.0.2
    • rsa==4.8
    • smart-open==5.2.1
    • wrapt==1.13.3
    • yarl==1.7.2

Reproduction script

---------------------------------------------------------------------------
RayTaskError(RayOutOfMemoryError)         Traceback (most recent call last)
<timed exec> in <module>

/tmp/ipykernel_2046457/2054134056.py in get_hist(asks, labels, window_len_min, window_len_max, resolution_window, resolution_feature_threshold)
     68     while refs_task:
     69         refs_done, refs_task = ray.wait(refs_task)
---> 70         dfs = ray.get(refs_done)
     71         res = pd.concat([res] + dfs, ignore_index=True)
     72     return res

~/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
    103             if func.__name__ != "init" or is_client_mode_enabled_by_default:
    104                 return getattr(ray, func.__name__)(*args, **kwargs)
--> 105         return func(*args, **kwargs)
    106 
    107     return wrapper

~/PROGS/miniconda3/envs/puma-lab/lib/python3.7/site-packages/ray/worker.py in get(object_refs, timeout)
   1711                     worker.core_worker.dump_object_store_memory_usage()
   1712                 if isinstance(value, RayTaskError):
-> 1713                     raise value.as_instanceof_cause()
   1714                 else:
   1715                     raise value

RayTaskError(RayOutOfMemoryError): ray::fun() (pid=50618, ip=192.168.0.106)
ray._private.memory_monitor.RayOutOfMemoryError: More than 95% of the memory on node node1 is used (14.88 / 15.53 GB). The top 10 memory consumers are:

PID	MEM	COMMAND
50462	0.6GiB	ray::fun()
50524	0.6GiB	ray::fun()
50493	0.6GiB	ray::fun()
50555	0.59GiB	ray::fun()
50275	0.58GiB	ray::fun()
50244	0.57GiB	ray::fun()
50151	0.57GiB	ray::fun()
50120	0.57GiB	ray::fun()
50306	0.57GiB	ray::fun()
50369	0.57GiB	ray::fun()

In addition, up to 0.21 GiB of shared memory is currently being used by the Ray object store.
---
--- Tip: Use the `ray memory` command to list active objects in the cluster.
--- To disable OOM exceptions, set RAY_DISABLE_MEMORY_MONITOR=1.
---

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
rkooo567commented, Dec 6, 2021

Good questions! Let me answer one by one here.

@rkooo567 running from jupyterlab I set os.environ[“RAY_DISABLE_MEMORY_MONITOR”] = “1” before importing ray. Since I don’t find anything for this env var in the docs, are you saying this has to be set for each worker (perhaps in cluster.yaml)?

Yes. If you are using cluster.yaml, you will have ray start command somewhere. You can do

RAY_DISABLE_MEMORY_MONITOR=1 ray start

The reason why os.environ is not working in this case, I think, is because the workers are started from process that’s started by ray start, so os.environ in your driver script (python script that you ran ray.init(address=‘auto’)) won’t be propagated there.

Also will this just disable the exception warning or will the task actually complete (using SWAP (also why isn’t this the default behavior - isn’t the sole purpose of SWAP to prevent exactly such OOM cases?)?

Not every system has swap memory on by default. For example, many of EC2 instances don’t have Swap on by default I believe.

High memory usage will likely to cause unexpected behavior without swap memory, so it is safer crashing it early as a default behavior. Also, Swap memory is very slow compared to regular memory, so relying on this by default can cause many unexpected slow down.

0reactions
rkooo567commented, Oct 26, 2022

We will remove RAY_DISABLE_MEMORY_MONITOR and replace it with Ray oom killer (https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html). So I think we can close it. I will just answer users questions here.

Would just point out that RAY_DISABLE_MEMORY_MONITOR env var is nowhere to be found in the docs.

This will be replaced by https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html which has the documentation! RAY_DISABLE_MEMORY_MONITOR basically disable the memory monitor which checks the mem usage of actor and kill them when the node memory usage exceeds the threshold (95%).

@rkooo567, Hey, by setting RAY_DISABLE_MEMORY_MONITOR=1, can we still get the stats by running ray memory? What does RAY_DISABLE_MEMORY_MONITOR mean? Does it mean ray will not raise errors and exit the process even if there is not enough memory?

yes. This flag is irrelevant to ray memory

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix the constantly growing memory usage of ray?
I recently ran into a similar problem and found that if you are frequently putting large objects (using ray.put() ) that you need...
Read more >
Why Swap is used when plenty of free memory is left?
No - swap usage is not a high water mark. When swapped out pages are mapped back the disk pages are marked as...
Read more >
Out of memory, but swap available - Unix Stack Exchange
I start to get: Out of memory! even though there is clearly swap available. $ free total used free shared buff/cache available Mem ......
Read more >
Swap is used even though >50% of RAM is still free
I have filed several developer bug reports to Apple and I am waiting for further info. The disk should NOT be used if...
Read more >
OutOfMemoryError: Out of swap space - Problem Patterns
This error message is thrown by the Java HotSpot VM (native code) ... The OS is also showing plenty of physical & virtual...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found