question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Saving CSV with backslashed-escaping is not idempotent.

See original GitHub issue

@pdbaines and I noticed this bug.

I want Pandas to write a CSV file so that all field data is backslash escaped if the character has a special interpretation (e.g. quotes or backslashes themselves). If a quote is backslashed, it is treated as field data, rather than a special character. This is not the behavior that I am seeing.

Consider the following data frame:

df = pd.DataFrame({"text": ["""Hello! Please "help" me. I cannot quote a csv.\\"""], "zoo": ["1"]})
df.to_csv("out.csv", index=False, quoting=csv.QUOTE_NONNUMERIC, encoding="utf-8", escapechar='\\', doublequote=False)

When written to a file, it looks something like this:

"text","zoo"
"Hello! Please \"help\" me. I cannot quote a csv.\","1"

The quotes are properly escaped in Please "help" me, but oddly, the end-quote of the field is backslashed, but the start-quote of the field is not back-slashed.

If I read the data frame in again using exactly the same parameters,

df2 = pd.read_csv("out.csv", quoting=csv.QUOTE_NONNUMERIC, encoding="utf-8", escapechar='\\', doublequote=False)

I get a data frame with both fields concatenated into the first field and the second field is NaN.

$ print(df2)
                                                text  zoo
0  Hello! Please "help" me. I cannot quote a csv....  NaN

If I instead, do the following:

df3 = pd.DataFrame({"text": ["""Hello! Please "help" me. I cannot quote a csv.\\\""""], "zoo": ["1"]})
df3.to_csv("outB.csv", index=False, quoting=csv.QUOTE_NONNUMERIC, encoding="utf-8", escapechar='\\', doublequote=False)
df4 = pd.read_csv("outB.csv", quoting=csv.QUOTE_NONNUMERIC, encoding="utf-8", escapechar='\\', doublequote=False)

I instead get a file with an odd-number of unescaped quote characters:

"text","zoo"
"Hello! Please \"help\" me. I cannot quote a csv.\\"","1"

and some unescaped quote characters are treated as data.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
dmitriyshashkincommented, Dec 25, 2017

I guess it’s related to this bug https://bugs.python.org/issue12178 Opened in 2011

2reactions
deadscommented, Sep 2, 2016

@gfyoung Please explain why it is desired behavior to not be able to save arbitrary data to a Pandas DataFrame cell and read it back in the same.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Removing backslash escape character when saving ...
When I save this DataFrame to a CSV file using pandas.DataFrame.to_csv , I would like to get rid of these backslashes so that...
Read more >
Pandas Dataframes: CSV Quoting and Escaping Strategies
Idempotent read and write. Must Quote nonnumeric, backslash for escaping. This is a safe pattern for most use cases: Sample CSV structure:.
Read more >
Pandas read csv remove double quotes
Jan 07, 2020 · This file uses backslash (\) character to escape the embedded double quotes. QUOTE_NONE: import csv pd. Removing quotes from...
Read more >
ansible.builtin.lineinfile module – Manage lines in text files
If specified, the file will be created if it does not already exist. ... NOTE: Yaml requires escaping backslashes in double quotes but...
Read more >
Pandas read csv remove double quotes - sono naturale
There are 2 accepted ways of escaping double-quotes in a CSV file. CSV processors do this by ... Click Save -- DO NOT...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found