question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: read_csv does not read double double quotes in pipe delimited txt file

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • [] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

This is a line in a file named as data.txt and I am trying to read this in pandas using read_csv. "xxx"|"xxx"|"-xxxxxxx"|"xxxxx"|"x"|"xx"|""xxxxxx""|"x"|"xx"|"xxxxxxx"|""|"x"|"xxxxxx"|"X"|"xxxx"|"xxxxx"|""

df = pd.read_csv('data.txt', names=columns, dtype=column_dict, na_values=[''], keep_default_na=False, sep='|', encoding='cp1252', skiprows=1)

Problem description

The problem I am facing here is that even though all the other data are read in the data frame correctly, pandas has an issue when it comes to reading ““xxxxxx”” two double quotes and it reads it as xxxxxx"" inside the dataframe As you can notice there is in the 7th index in the above line, there is an item with double-double quotes, that is the issue

Expected Output

The expected output should be that it should be read as “xxxxxx” inside the data frame

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb96529396d93b46abab7bbc73a208e708c642e
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1043-gcp
Version : #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
phoflcommented, Jun 5, 2021

Did you try csv.QUOTE_NONE?

df = pd.read_csv(StringIO(data), na_values=[''], keep_default_na=False, sep='|', encoding='cp1252', skiprows=1, quoting=csv.QUOTE_NONE)

Please provide reproducible examples in the future, which do not rely on external variables or files (see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)

0reactions
mroeschkecommented, Aug 21, 2021

Thanks for the report, but it appears that the behavior is expected. Closing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

double quoted elements in csv cant read with pandas
This will work. It falls back to the python parser (as you have non-regular separators, e.g. they are comma and sometimes space).
Read more >
Read CSV using pandas with values enclosed with double ...
I "think" these two commands, with single quotes and double quotes between the file name, should work the same way (or not?):
Read more >
Spark Read CSV doesn't preserve the double quotes while ...
Spark Read CSV doesn't preserve the double quotes while reading! Hi , I am trying to read a csv file with one column...
Read more >
ADF pipeline failing to read CSV file if a column values ...
txt I have a CSV file which is comma (,) separated and in a column value (Column D) it contains comma delimiter(,) along...
Read more >
Error reading CSV File - GoAnywhere Forum
GoAnywhere Director : Community Forum : Im trying to read a CSV file and upload it to a DB2-400 database, The CVS data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found