question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dfsql tests failing on Windows/MacOS after the last modin update

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows, MacOS
  • Modin version (modin.__version__): 0.10.1
  • Python version: 3.7, 3.8
  • Code we can use to reproduce:

Describe the problem

We have been getting hanging unit-tests in Github actions since upgrading to latest modin. I haven’t been able to find what the problem is exactly, but tests just hang forever.

I am creating this issue in case the problem is modin-related.

Source code / logs

Adding timeouts to tests revealed such logs on Windows:

(pid=6528) Windows fatal exception: access violation
(pid=6528) 
(pid=420) Windows fatal exception: access violation
(pid=420) 
(pid=5080) Windows fatal exception: access violation
(pid=5080) 
(pid=3592) Windows fatal exception: access violation
(pid=3592) 
(pid=6112) Windows fatal exception: access violation
(pid=6112) 
(pid=5660) Windows fatal exception: access violation
(pid=5660) 
(pid=6404) Windows fatal exception: access violation
(pid=6404) 
(pid=3924) Windows fatal exception: access violation
(pid=3924) 
(pid=3684) Windows fatal exception: access violation
(pid=3684) 

It might be related to ray saving logs, as in this issue. Weird that the issue is old, but these messages didn’t appear earlier.

On Windows + python 3.7 (but not 3.8) this Segfault happens, which seems to be ray/modin related:

Thread 0x00001b24 (most recent call first):
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\ray\worker.py", line 1637 in wait
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\ray\_private\client_mode_hook.py", line 62 in wrapper
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\engines\ray\generic\io.py", line 198 in to_csv
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\data_management\factories\factories.py", line 398 in _to_csv
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\data_management\factories\dispatcher.py", line 267 in to_csv
  File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\pandas\base.py", line 2513 in to_csv
  File "d:\a\dfsql\dfsql\dfsql\__init__.py", line 25 in sql_query
  File "d:\a\dfsql\dfsql\dfsql\extensions.py", line 66 in __call__
  File "D:\a\dfsql\dfsql\tests\test_extensions.py", line 47 in test_df_sql_nested_select_in
...
D:\a\_temp\1ac47c42-bfc7-4fcd-b689-944e647c7102.sh: line 1:  1841 Segmentation fault 

The full logs are available here: https://github.com/mindsdb/dfsql/pull/19/checks?check_run_id=3112462155

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:27 (21 by maintainers)

github_iconTop GitHub Comments

1reaction
anmyachevcommented, Aug 25, 2021

I was able to reproduce hanging behavior locally (Reproducibility is not 100%).

Environment:

conda env create -f environment-dev.yml
set MODIN_CPUS=4
set MODIN_ENGINE=ray
pytest modin\pandas\test\test_io.py::TestCsv::test_hanging_behavior --verbose -s

Simplified reproducer (that should be added to TestCsv class):

def test_hanging_behavior(self):
    for i in range(16):
        #print("to_csv")
        pd.DataFrame([1, 2, 3, 4]).to_csv("initial-data.csv", index=False)
        #print("read_csv")
        df = pd.read_csv("initial-data.csv")
        #print("isnull, all, axis=1")
        df.index[df.isnull().all(axis=1)].values.tolist()
        #print("isnull, all, axis=0")
        df.columns[df.isnull().all(axis=0)].values.tolist()

Logs:

...\modin>pytest modin\pandas\test\test_io.py::TestCsv::test_hanging_behavior --verbose -s
=============================================== test session starts ===============================================
platform win32 -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- ...\Miniconda3\envs\modin\python.exe
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: ...\modin, configfile: setup.cfg
plugins: benchmark-3.4.1, cov-2.11.0, forked-1.3.0, xdist-2.3.0
collected 1 item

modin/pandas/test/test_io.py::TestCsv::test_just_test to_csv
read_csv
(pid=20528) Windows fatal exception: access violation
(pid=20528)
isnull, all, axis=1
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=15432) Windows fatal exception: access violation
(pid=15432)
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=23128) Windows fatal exception: access violation
(pid=23128)
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=3412) Windows fatal exception: access violation
(pid=3412)
isnull, all, axis=0
to_csv
read_csv
(pid=20984) Windows fatal exception: access violation
(pid=20984)
isnull, all, axis=1
isnull, all, axis=0
to_csv
read_csv
(pid=7096) Windows fatal exception: access violation
(pid=7096)
isnull, all, axis=1
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=23464) Windows fatal exception: access violation
(pid=23464)
isnull, all, axis=0
to_csv
read_csv
(pid=12504) Windows fatal exception: access violation
(pid=12504)
isnull, all, axis=1
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=11500) Windows fatal exception: access violation
(pid=11500) 
isnull, all, axis=0
to_csv
read_csv
(pid=19948) Windows fatal exception: access violation
(pid=19948)
isnull, all, axis=1
isnull, all, axis=0
to_csv
read_csv
isnull, all, axis=1
(pid=20848) Windows fatal exception: access violation
(pid=20848) 
isnull, all, axis=0
to_csv
2021-08-25 20:03:18,393 WARNING worker.py:1189 -- The actor or task with ID c6cf2fddfe5e7c90b398e5da6a4450ee63f746a18d1ec44e cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining 
{4.000000/4.000000 CPU, 13.969839 GiB/13.969839 GiB memory, 13.969839 GiB/13.969839 GiB object_store_memory, 1.000000/1.000000 node:10.147.230.30}
. In total there are 1 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources 
being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale or if you specified a runtime_env for this task or actor because it takes time to install.

@rkooo567 did this clarify anything?

1reaction
btseytlincommented, Aug 17, 2021

Finally the issue turned out to be Modin related. Ray hanged when I was trying to write a Modin DataFrame to disk while using Ray. It seems there was some kind of a deadlock, but I am still not sure.

For now I resolved it by using Pandas to write to disk, after which the issue was gone: https://github.com/mindsdb/dfsql/pull/19/files#diff-287da181ac34dcb8710924d3be04f46fac4c8b26c7de303766af97d571d1b969R26

Still, it’s wroth investigating why that was happening.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[core] "Windows fatal exception: access violation" cluttering ...
What is the problem? I am using Ray 1.1.0 with Python 3.7.6 to run an ActorPool. Each actor needs access to it's own...
Read more >
Troubleshooting — Modin 0.18.0+0.gba7ab8eb.dirty ...
Most commonly this is encountered when starting multiple notebooks or interpreters in quick succession. Solution. Restart your interpreter or notebook kernel.
Read more >
Release 0.18.0+20.gff477202.dirty Modin contributors
To install the most recent stable release run the following: pip install -U modin # -U for upgrade in case you have an...
Read more >
Modin pandas / modin.db_conn database connection error ...
Any help greatly appreciated! EDIT: This fixed this issue but since the connection is properly established, it complains that it cant find the ......
Read more >
点安装matplotlib, 安装点子, 如何在Python 3.7 Windows 10 中安装 ...
推荐的使用方法是将其作为一个模块调用,尤其是在安装了多个python 发行版或版本的情况下,以保证包到达正确的位置: python -m pip install --upgrade packageXYZ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found