question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

importing modin result in latest arrow 6.0 hang

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS Windows 10

  • Modin version (modin.__version__): 0.12.1

  • Python version: Python 3.9.1

  • Code we can use to reproduce:

Install latest modin and arrow with:

conda create -n test -c conda-forge modin-all pyarrow=6.0.1

Then use the code to reproduce.

# importing modin result in hang for arrow iterator
import modin.pandas as pd
import pyarrow.dataset as ds

dfile = 'veterans_lung_cancer.csv'
data = ds.dataset(dfile, format='csv')
rb_iter = iter(data.to_batches())
next(rb_iter)

Also attach the code:

modin_arrow_hang.tar.gz

Describe the problem

The code hang with arrow 6.0, but for arrow 4.0 it’s ok. If modin is removed, everything is fine.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
devin-petersohncommented, Jan 18, 2022
1reaction
devin-petersohncommented, Jan 18, 2022

Ok, I was able to reproduce this without Modin:

import os
import pyarrow.dataset as ds

os.environ["OMP_NUM_THREADS"] = "1"  # offending line

dfile = 'veterans_lung_cancer.csv'
data = ds.dataset(dfile, format='csv')
rb_iter = iter(data.to_batches())
next(rb_iter)  # hangs

This probably needs to be raised in pyarrow because it is not something we could fix in Modin. Thanks @xwu99 for finding/reporting this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

importing modin result in latest arrow 6.0 hang - Modin-Project/Modin
This issue has been created since 2022-01-17. System information. OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS Windows 10.
Read more >
Troubleshooting — Modin 0.12.1+0.g34962ec.dirty ...
Hanging on import modin.​​ This can happen when Ray fails to start. It will keep retrying, but often it is faster to just...
Read more >
Error importing modin on Linux: Connection to IPC socket ...
Modin fails to import when I try to import modin.pandas as pd (tested on different shells/venv and from a script). Running import modin...
Read more >
MemoryError when calling pd.read_csv - Google Groups
to modin-dev. When I read a 400Mb compressed file (4GB uncompressed) using: import modin.pandas as pd. df = pd.read_csv( file, quotechar='"', ...
Read more >
Transforming Datasets — Ray 2.2.0 - the Ray documentation
Datasets transformations take in datasets and produce new datasets. ... import ray import pandas # Create a dataset from file with Iris data....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found