dfsql tests failing on Windows/MacOS after the last modin update
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows, MacOS
- Modin version (
modin.__version__
): 0.10.1 - Python version: 3.7, 3.8
- Code we can use to reproduce:
Describe the problem
We have been getting hanging unit-tests in Github actions since upgrading to latest modin. I haven’t been able to find what the problem is exactly, but tests just hang forever.
I am creating this issue in case the problem is modin-related.
Source code / logs
Adding timeouts to tests revealed such logs on Windows:
(pid=6528) Windows fatal exception: access violation
(pid=6528)
(pid=420) Windows fatal exception: access violation
(pid=420)
(pid=5080) Windows fatal exception: access violation
(pid=5080)
(pid=3592) Windows fatal exception: access violation
(pid=3592)
(pid=6112) Windows fatal exception: access violation
(pid=6112)
(pid=5660) Windows fatal exception: access violation
(pid=5660)
(pid=6404) Windows fatal exception: access violation
(pid=6404)
(pid=3924) Windows fatal exception: access violation
(pid=3924)
(pid=3684) Windows fatal exception: access violation
(pid=3684)
It might be related to ray saving logs, as in this issue. Weird that the issue is old, but these messages didn’t appear earlier.
On Windows + python 3.7 (but not 3.8) this Segfault happens, which seems to be ray/modin related:
Thread 0x00001b24 (most recent call first):
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\ray\worker.py", line 1637 in wait
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\ray\_private\client_mode_hook.py", line 62 in wrapper
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\engines\ray\generic\io.py", line 198 in to_csv
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\data_management\factories\factories.py", line 398 in _to_csv
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\data_management\factories\dispatcher.py", line 267 in to_csv
File "c:\hostedtoolcache\windows\python\3.7.9\x64\lib\site-packages\modin\pandas\base.py", line 2513 in to_csv
File "d:\a\dfsql\dfsql\dfsql\__init__.py", line 25 in sql_query
File "d:\a\dfsql\dfsql\dfsql\extensions.py", line 66 in __call__
File "D:\a\dfsql\dfsql\tests\test_extensions.py", line 47 in test_df_sql_nested_select_in
...
D:\a\_temp\1ac47c42-bfc7-4fcd-b689-944e647c7102.sh: line 1: 1841 Segmentation fault
The full logs are available here: https://github.com/mindsdb/dfsql/pull/19/checks?check_run_id=3112462155
Issue Analytics
- State:
- Created 2 years ago
- Comments:27 (21 by maintainers)
Top Results From Across the Web
[core] "Windows fatal exception: access violation" cluttering ...
What is the problem? I am using Ray 1.1.0 with Python 3.7.6 to run an ActorPool. Each actor needs access to it's own...
Read more >Troubleshooting — Modin 0.18.0+0.gba7ab8eb.dirty ...
Most commonly this is encountered when starting multiple notebooks or interpreters in quick succession. Solution. Restart your interpreter or notebook kernel.
Read more >Release 0.18.0+20.gff477202.dirty Modin contributors
To install the most recent stable release run the following: pip install -U modin # -U for upgrade in case you have an...
Read more >Modin pandas / modin.db_conn database connection error ...
Any help greatly appreciated! EDIT: This fixed this issue but since the connection is properly established, it complains that it cant find the ......
Read more >点安装matplotlib, 安装点子, 如何在Python 3.7 Windows 10 中安装 ...
推荐的使用方法是将其作为一个模块调用,尤其是在安装了多个python 发行版或版本的情况下,以保证包到达正确的位置: python -m pip install --upgrade packageXYZ ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I was able to reproduce hanging behavior locally (Reproducibility is not 100%).
Environment:
Simplified reproducer (that should be added to
TestCsv
class):Logs:
@rkooo567 did this clarify anything?
Finally the issue turned out to be Modin related. Ray hanged when I was trying to write a Modin DataFrame to disk while using Ray. It seems there was some kind of a deadlock, but I am still not sure.
For now I resolved it by using Pandas to write to disk, after which the issue was gone: https://github.com/mindsdb/dfsql/pull/19/files#diff-287da181ac34dcb8710924d3be04f46fac4c8b26c7de303766af97d571d1b969R26
Still, it’s wroth investigating why that was happening.