Pandas parses csv file correctly but modin fails
See original GitHub issueSystem information
- **OS Platform and Distribution **:
Mac Os Mojave 10.14.6
- Modin version
0.10.0
: - Python version:
3.8.5
- Code we can use to reproduce:
import ray
if ray.is_initialized():
ray.shutdown()
ray.init(address='auto')
else:
ray.init()
from pathlib import Path
import modin.pandas as mpd
import pandas as pd
PATH = Path.home() / 'demo.csv'
# Pandas reads successfully with no problems
data = pd.read_csv(PATH, encoding='utf-8-sig', names=['Text', 'Annotation'], quotechar='"', skipinitialspace=True, sep=',', header=0)
# Modin throws an error
data = mpd.read_csv(PATH, encoding='utf-8-sig', names=['Text', 'Annotation'], quotechar='"', skipinitialspace=True, sep=',', header=0)
Describe the problem
As described in #1008
Source code / logs
File "pandas/_libs/parsers.pyx", line 518, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 706, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1951, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 1
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
read_csv difference to pandas · Issue #1008 - GitHub
gives no problem and correctly parses the data but with import modin.pandas as mpd I get the same pandas.errors.
Read more >Modin read_csv issue - pandas - Stack Overflow
I'm attempting to read a csv file using modin and it results in the following error. this issue seems to happen on all...
Read more >Pandas loads a csv file incorrectly, but without throwing an error
So I tried loading some data through pandas to practice manipulating it, but I ran into a slight problem. Basically, the pandas load...
Read more >Pandas read_csv() - How to read a csv file in Python
The pandas.read_csv is used to load a CSV file as a pandas dataframe. ... will try to parse the index, else parse the...
Read more >Release 0.17.1+0.g7f801adc.dirty Modin contributors
Modin exposes the pandas API through modin.pandas, but it does not inherit the same pitfalls and design decisions.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@thenewera-ru I can reproduce the issue with your CSV, it looks like it’s an issue with the encoding not being properly handled in the workers, but it’s still early to tell.
We’ll get this worked on, thanks for reporting!
Fixed in 0.11.3