Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting a ... in my CSV when using to_csv()

See original GitHub issue

I have read in an hdf of 4 million+ rows and now I want to convert it to a sample CSV:

df_small = df[:int(1e6)]
df_small.to_csv("X.csv", sep='\t')
len(df_small)
# out: 1,000,000

The dataframe consists of a datetime index and a text column.

When I read the CSV back in, I get more rows than when I saved it:

df2 = pd.read_csv("X.csv",  
                  sep='\t',
                  engine='python', 
                  parse_dates=['datetime'],            
                  index_col='datetime'
                  infer_datetime_format=True)
len(df2)
# out: 1,000,002

And looking at my index, the datetime wasn’t actually parsed, it’s just dtype Object.

I used my own parser and it had an error when it hit a “…” in my datetime index, which wasn’t there before.

I opened up the CSV in Excel and found a “…” in my datetime column, and I also noticed that my datetime index and first column were merged together. Not sure if that’s relevant or just the way Excel reads it.

When I use read_csv the data comes in fine except for that couple of extra rows with “…” in the index. The row at that index is also just blank.

Issue Analytics

State:
Created 5 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, May 7, 2018

Hmm OK. FWIW it looks like Python’s built-in CSV module is currently hard-coded to recognize \r or \n as EOL:

https://docs.python.org/3.6/library/csv.html#csv.Dialect.lineterminator

I suppose irrespective of the Python parser that the C parser should not be getting tripped up by it so this is probably a bug. Investigation / PRs welcome

1reaction

BrendanMartincommented, May 7, 2018

I actually found out what it was. The text is all tweets, and I guess some tweets have \r in them for some reason. This seems to have caused a newline when writing to CSV, so I removed all \r and now I’m getting the right number of rows.

If there’s \r in text, should it have been escaped?

Top Results From Across the Web

Pandas to_csv() - Convert DataFrame to CSV - DigitalOcean

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file....

Pandas Dataframe to CSV File – Export Using .to_csv() - Datagy

Learn how to use Pandas to convert a dataframe to a CSV file, using the .to_csv() method, which helps export Pandas to CSV...

Pandas Write To CSV – pd.DataFrame.to_csv()

Write your DataFrame directly to file using .to_csv(). This function starts simple, but you can get complicated quickly. Save your data to your...

pandas.DataFrame.to_csv — pandas 1.5.2 documentation

Write object to a comma-separated values (csv) file. String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. ...

to_csv() to write DataFrame data to CSV files - Plus2net

Data from MySQL table to CSV file. Read on how to connect to MySQL database and then collected the records of student table...