importing modin result in latest arrow 6.0 hang
See original GitHub issueSystem information
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS Windows 10
-
Modin version (
modin.__version__
): 0.12.1 -
Python version: Python 3.9.1
-
Code we can use to reproduce:
Install latest modin and arrow with:
conda create -n test -c conda-forge modin-all pyarrow=6.0.1
Then use the code to reproduce.
# importing modin result in hang for arrow iterator
import modin.pandas as pd
import pyarrow.dataset as ds
dfile = 'veterans_lung_cancer.csv'
data = ds.dataset(dfile, format='csv')
rb_iter = iter(data.to_batches())
next(rb_iter)
Also attach the code:
Describe the problem
The code hang with arrow 6.0, but for arrow 4.0 it’s ok. If modin is removed, everything is fine.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (9 by maintainers)
Top Results From Across the Web
importing modin result in latest arrow 6.0 hang - Modin-Project/Modin
This issue has been created since 2022-01-17. System information. OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS Windows 10.
Read more >Troubleshooting — Modin 0.12.1+0.g34962ec.dirty ...
Hanging on import modin. This can happen when Ray fails to start. It will keep retrying, but often it is faster to just...
Read more >Error importing modin on Linux: Connection to IPC socket ...
Modin fails to import when I try to import modin.pandas as pd (tested on different shells/venv and from a script). Running import modin...
Read more >MemoryError when calling pd.read_csv - Google Groups
to modin-dev. When I read a 400Mb compressed file (4GB uncompressed) using: import modin.pandas as pd. df = pd.read_csv( file, quotechar='"', ...
Read more >Transforming Datasets — Ray 2.2.0 - the Ray documentation
Datasets transformations take in datasets and produce new datasets. ... import ray import pandas # Create a dataset from file with Iris data....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I have opened an issue here: https://issues.apache.org/jira/browse/ARROW-15362
Ok, I was able to reproduce this without Modin:
This probably needs to be raised in pyarrow because it is not something we could fix in Modin. Thanks @xwu99 for finding/reporting this!