Getting a ... in my CSV when using to_csv()
See original GitHub issueI have read in an hdf of 4 million+ rows and now I want to convert it to a sample CSV:
df_small = df[:int(1e6)]
df_small.to_csv("X.csv", sep='\t')
len(df_small)
# out: 1,000,000
The dataframe consists of a datetime index and a text column.
When I read the CSV back in, I get more rows than when I saved it:
df2 = pd.read_csv("X.csv",
sep='\t',
engine='python',
parse_dates=['datetime'],
index_col='datetime'
infer_datetime_format=True)
len(df2)
# out: 1,000,002
And looking at my index, the datetime wasn’t actually parsed, it’s just dtype Object.
I used my own parser and it had an error when it hit a “…” in my datetime index, which wasn’t there before.
I opened up the CSV in Excel and found a “…” in my datetime column, and I also noticed that my datetime index and first column were merged together. Not sure if that’s relevant or just the way Excel reads it.
When I use read_csv the data comes in fine except for that couple of extra rows with “…” in the index. The row at that index is also just blank.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)

Top Related StackOverflow Question
Hmm OK. FWIW it looks like Python’s built-in CSV module is currently hard-coded to recognize
\ror\nas EOL:https://docs.python.org/3.6/library/csv.html#csv.Dialect.lineterminator
I suppose irrespective of the Python parser that the C parser should not be getting tripped up by it so this is probably a bug. Investigation / PRs welcome
I actually found out what it was. The text is all tweets, and I guess some tweets have \r in them for some reason. This seems to have caused a newline when writing to CSV, so I removed all \r and now I’m getting the right number of rows.
If there’s \r in text, should it have been escaped?