BUG: read_csv does not read double double quotes in pipe delimited txt file
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
[] (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
This is a line in a file named as data.txt
and I am trying to read this in pandas using read_csv
.
"xxx"|"xxx"|"-xxxxxxx"|"xxxxx"|"x"|"xx"|""xxxxxx""|"x"|"xx"|"xxxxxxx"|""|"x"|"xxxxxx"|"X"|"xxxx"|"xxxxx"|""
df = pd.read_csv('data.txt', names=columns, dtype=column_dict, na_values=[''], keep_default_na=False, sep='|', encoding='cp1252', skiprows=1)
Problem description
The problem I am facing here is that even though all the other data are read in the data frame correctly, pandas has an issue when it comes to reading ““xxxxxx”” two double quotes and it reads it as xxxxxx"" inside the dataframe As you can notice there is in the 7th index in the above line, there is an item with double-double quotes, that is the issue
Expected Output
The expected output should be that it should be read as “xxxxxx” inside the data frame
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb96529396d93b46abab7bbc73a208e708c642e
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1043-gcp
Version : #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Did you try csv.QUOTE_NONE?
Please provide reproducible examples in the future, which do not rely on external variables or files (see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)
Thanks for the report, but it appears that the behavior is expected. Closing.