question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: to_csv requires escapechar unnecessarily when data contains null byte \x00 (Python 3.10+ only)

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

########### PowerShell ############
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)

# Error: need to escape, but no escapechar set

import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False, escapechar='\\')

# Works, BUT escapechar appears to be redundant, because there is no escapechar in the file on disk!
$ xxd null_byte.csv            # from WSL, because no xxd in PowerShell
00000000: 410d 0a00 0d0a                           A.....

########### Ubuntu/Bash ############
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)

# WORKS, but output has different format:
$ xxd null_byte.csv
00000000: 410a 2200 220a                           A.".".

Issue Description

NOTE: I’m running this in PowerShell, but using xxd from Windows Subsystem for Linux.

If a dataframe contains a null byte \x00 as a value, to_csv requires escapechar to be set when run from PowerShell, but not when run from Ubuntu/Bash. However, the escapechar is not actually used, and does not appear in the output file.

Expected Behavior

I expect that if escapechar is not used, I shouldn’t need to use escapechar='\\'. I also expect that the flags for to_csv should be the same for both PowerShell and Ubuntu/Bash.

i.e. I expect that the following should just work in both PowerShell and Bash, without needing to specify escapechar:

import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)

Installed Versions

Pandas in Powershell:

INSTALLED VERSIONS

commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.10.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Australia.1252

pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : 0.8.1 fsspec : 2022.5.0 gcsfs : None markupsafe : 2.1.1 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 6.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : 1.4.39 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None

Pandas in Ubuntu/Bash

INSTALLED VERSIONS

commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.10.102.1-microsoft-standard-WSL2 Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 63.2.0 pip : 22.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.4.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
simonjayhawkinscommented, Jul 28, 2022

looks like could be Python version. I’m seeing “need to escape, but no escapechar set” with python 3.10 but not on python 3.9 or 3.8 using the same version of pandas, either 1.3.5 or 1.4.3

(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ python --version
Python 3.8.13
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ python bisect/47871.py 
1.4.3
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a                           A.".".
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ 
(pandas-1.4.3-test) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.4.3
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ python bisect/47871.py 
1.4.3
need to escape, but no escapechar set
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a                                     A.
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ python --version
Python 3.10.5
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ 
(pandas-1.4.3) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.4.3-py3.9
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ python bisect/47871.py 
1.4.3
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a                           A.".".
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ python --version
Python 3.9.13
(pandas-1.4.3-py3.9) simon@stadia:~/pandas (bisect)$ 


(pandas-dev) simon@stadia:~/pandas (bisect)$ conda activate pandas-1.3.5
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ python bisect/47871.py 
1.3.5
need to escape, but no escapechar set
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a                                     A.
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ python --version
Python 3.10.1
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ 
(pandas-1.3.5) simon@stadia:~/pandas (bisect)$ conda activate activate pandas-1.3.5-py3.9
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ python bisect/47871.py 
1.3.5
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ xxd null_byte.csv
00000000: 410a 2200 220a                           A.".".
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ python --version
Python 3.9.13
(pandas-1.3.5-py3.9) simon@stadia:~/pandas (bisect)$ 
0reactions
anubhav562commented, Nov 30, 2022

I had the same issue. I tried a lot of ways to fix it:

  1. Adding the escapechar argument
  2. using different types of quoting techniques
  3. Replacing the delimiter

None of the above worked!

What worked was changing Python version from 3.10 to 3.9.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python CSV error: line contains NULL byte - Stack Overflow
Using the Python csv module, I was trying to read an XLS file created in MS Excel and running into the NULL byte...
Read more >
Issue 27580: CSV Null Byte Error - Python tracker
I think this has been asked before, but it has been awhile and I think needs to be re-addressed. Why doesn't the CSV...
Read more >
Solved: Issue with line containing NULL byte - Esri Community
I fixed the Null byte problem by using DictReader and DictWriter, but now I am running into a NoneType error when the new...
Read more >
about 'read_csv()' with \x00 contains in the file - Google Groups
I've got a csv file with '\x00' and try to use read_csv() to rad this file and it failed, got following ... Error:...
Read more >
File full of NULL Bytes (\x00). Easiest way to handle this?
I have a csv file which I've downloaded from a vendor website for data ingestion. I tried using python's csv module to open...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found