question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pandas parses csv file correctly but modin fails

See original GitHub issue

System information

  • **OS Platform and Distribution **: Mac Os Mojave 10.14.6
  • Modin version 0.10.0:
  • Python version: 3.8.5
  • Code we can use to reproduce:
import ray
if ray.is_initialized():
    ray.shutdown()
    ray.init(address='auto')
else:
    ray.init()

from pathlib import Path
import modin.pandas as mpd
import pandas as pd

PATH = Path.home() / 'demo.csv'

# Pandas reads successfully with no problems
data = pd.read_csv(PATH, encoding='utf-8-sig', names=['Text', 'Annotation'], quotechar='"', skipinitialspace=True, sep=',', header=0)

# Modin throws an error 
data = mpd.read_csv(PATH, encoding='utf-8-sig', names=['Text', 'Annotation'], quotechar='"', skipinitialspace=True, sep=',', header=0)

Describe the problem

As described in #1008

Source code / logs

  File "pandas/_libs/parsers.pyx", line 518, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 706, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1951, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, Jun 15, 2021

@thenewera-ru I can reproduce the issue with your CSV, it looks like it’s an issue with the encoding not being properly handled in the workers, but it’s still early to tell.

We’ll get this worked on, thanks for reporting!

0reactions
vnlitvinovcommented, Nov 9, 2021

Fixed in 0.11.3

Read more comments on GitHub >

github_iconTop Results From Across the Web

read_csv difference to pandas · Issue #1008 - GitHub
gives no problem and correctly parses the data but with import modin.pandas as mpd I get the same pandas.errors.
Read more >
Modin read_csv issue - pandas - Stack Overflow
I'm attempting to read a csv file using modin and it results in the following error. this issue seems to happen on all...
Read more >
Pandas loads a csv file incorrectly, but without throwing an error
So I tried loading some data through pandas to practice manipulating it, but I ran into a slight problem. Basically, the pandas load...
Read more >
Pandas read_csv() - How to read a csv file in Python
The pandas.read_csv is used to load a CSV file as a pandas dataframe. ... will try to parse the index, else parse the...
Read more >
Release 0.17.1+0.g7f801adc.dirty Modin contributors
Modin exposes the pandas API through modin.pandas, but it does not inherit the same pitfalls and design decisions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found