read_csv with Ray engine fails with some combinations of `Column and Index Locations and Names` parameters
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Modin version (
modin.__version__
): 0.8.1.1+34.ga571e10 - Python version: 3.8.6
- Code we can use to reproduce:
import os
os.environ["MODIN_ENGINE"] = "ray"
import pandas
import modin.pandas as pd
from modin.pandas.test.utils import df_equals
test_filename = "test.csv"
kwargs = {
"filepath_or_buffer": test_filename,
"index_col": "col1",
"usecols": ["col1"],
}
str_two_cols = """col1,col2
0,1
2,3
"""
try :
with open(test_filename, "w") as f:
f.write(str_two_cols)
df_pandas = pandas.read_csv(**kwargs)
print(df_pandas)
df_pd = pd.read_csv(**kwargs)
print(df_pd)
df_equals(df_pd, df_pandas)
finally:
os.remove(test_filename)
Describe the problem
Source code / logs
Empty DataFrame
Columns: []
Index: [0, 2]
Traceback (most recent call last):
File "test.py", line 26, in <module>
df_pd = pd.read_csv(**kwargs)
File "/localdisk/amyskov/modin/modin/pandas/io.py", line 109, in parser_func
return _read(**kwargs)
File "/localdisk/amyskov/modin/modin/pandas/io.py", line 127, in _read
pd_obj = EngineDispatcher.read_csv(**kwargs)
File "/localdisk/amyskov/modin/modin/data_management/factories/dispatcher.py", line 104, in read_csv
return cls.__engine._read_csv(**kwargs)
File "/localdisk/amyskov/modin/modin/data_management/factories/factories.py", line 87, in _read_csv
return cls.io_cls.read_csv(**kwargs)
File "/localdisk/amyskov/modin/modin/engines/base/io/file_reader.py", line 29, in read
query_compiler = cls._read(*args, **kwargs)
File "/localdisk/amyskov/modin/modin/engines/base/io/text/csv_reader.py", line 142, in _read
column_chunksize = compute_chunksize(empty_pd_df, num_splits, axis=1)
File "/localdisk/amyskov/modin/modin/data_management/utils.py", line 54, in compute_chunksize
col_chunksize = get_default_chunksize(len(df.columns), num_splits)
File "/localdisk/amyskov/modin/modin/data_management/utils.py", line 29, in get_default_chunksize
length // num_splits if length % num_splits == 0 else length // num_splits + 1
ZeroDivisionError: integer division or modulo by zero
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
How to “read_csv” with Pandas - Towards Data Science
We can solve this issue using header parameter. In most cases, the first row in a csv file includes column names and inferred...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
If this option is set to True , nothing should be passed in for the delimiter parameter. Column and index locations and names#....
Read more >Troubleshooting — Modin 0.12.1+0.g34962ec.dirty ...
This can happen when Ray fails to start. It will keep retrying, but often it is faster to just restart the notebook or...
Read more >Understanding Delimiters in Pandas read_csv() Function
Pandas can also be identified as a combination of two or more Pandas Series ... CSV (or Comma Separated Values) files, as the...
Read more >Python Pandas Cheat Sheet - Edlitera
Select data using labels (column names and row index labels) ... of boolean values: Select specific rows and columns using combinations of integer...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@ymoslem, temporally this bug can be avoided by adding
index_col=False
option to theread_csv
function, see example below:Hope this helps!
I am not able to reproduce this bug on master, so I’ll go ahead and close this issue. Please re-open if needed!