question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting a ... in my CSV when using to_csv()

See original GitHub issue

I have read in an hdf of 4 million+ rows and now I want to convert it to a sample CSV:

df_small = df[:int(1e6)]
df_small.to_csv("X.csv", sep='\t')
len(df_small)
# out: 1,000,000

The dataframe consists of a datetime index and a text column.

When I read the CSV back in, I get more rows than when I saved it:

df2 = pd.read_csv("X.csv",  
                  sep='\t',
                  engine='python', 
                  parse_dates=['datetime'],            
                  index_col='datetime'
                  infer_datetime_format=True)
len(df2)
# out: 1,000,002

And looking at my index, the datetime wasn’t actually parsed, it’s just dtype Object.

I used my own parser and it had an error when it hit a “…” in my datetime index, which wasn’t there before.

I opened up the CSV in Excel and found a “…” in my datetime column, and I also noticed that my datetime index and first column were merged together. Not sure if that’s relevant or just the way Excel reads it.

When I use read_csv the data comes in fine except for that couple of extra rows with “…” in the index. The row at that index is also just blank.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, May 7, 2018

Hmm OK. FWIW it looks like Python’s built-in CSV module is currently hard-coded to recognize \r or \n as EOL:

https://docs.python.org/3.6/library/csv.html#csv.Dialect.lineterminator

I suppose irrespective of the Python parser that the C parser should not be getting tripped up by it so this is probably a bug. Investigation / PRs welcome

1reaction
BrendanMartincommented, May 7, 2018

I actually found out what it was. The text is all tweets, and I guess some tweets have \r in them for some reason. This seems to have caused a newline when writing to CSV, so I removed all \r and now I’m getting the right number of rows.

If there’s \r in text, should it have been escaped?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas to_csv() - Convert DataFrame to CSV - DigitalOcean
Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file....
Read more >
Pandas Dataframe to CSV File – Export Using .to_csv() - Datagy
Learn how to use Pandas to convert a dataframe to a CSV file, using the .to_csv() method, which helps export Pandas to CSV...
Read more >
Pandas Write To CSV – pd.DataFrame.to_csv()
Write your DataFrame directly to file using .to_csv(). This function starts simple, but you can get complicated quickly. Save your data to your...
Read more >
pandas.DataFrame.to_csv — pandas 1.5.2 documentation
Write object to a comma-separated values (csv) file. String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. ...
Read more >
to_csv() to write DataFrame data to CSV files - Plus2net
Data from MySQL table to CSV file. Read on how to connect to MySQL database and then collected the records of student table...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found