BUG: to_csv requires escapechar unnecessarily when data contains null byte \x00 (Python 3.10+ only)
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
########### PowerShell ############
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)
# Error: need to escape, but no escapechar set
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False, escapechar='\\')
# Works, BUT escapechar appears to be redundant, because there is no escapechar in the file on disk!
$ xxd null_byte.csv # from WSL, because no xxd in PowerShell
00000000: 410d 0a00 0d0a A.....
########### Ubuntu/Bash ############
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)
# WORKS, but output has different format:
$ xxd null_byte.csv
00000000: 410a 2200 220a A.".".
Issue Description
NOTE: I’m running this in PowerShell, but using xxd
from Windows Subsystem for Linux.
If a dataframe contains a null byte \x00
as a value, to_csv
requires escapechar
to be set when run from PowerShell, but not when run from Ubuntu/Bash. However, the escapechar
is not actually used, and does not appear in the output file.
Expected Behavior
I expect that if escapechar
is not used, I shouldn’t need to use escapechar='\\'
.
I also expect that the flags for to_csv
should be the same for both PowerShell and Ubuntu/Bash.
i.e. I expect that the following should just work in both PowerShell and Bash, without needing to specify escapechar
:
import pandas as pd
df = pd.DataFrame({'A': ['\x00']})
df.to_csv('null_byte.csv', index=False)
Installed Versions
Pandas in Powershell:
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.10.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Australia.1252
pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.4.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : 0.8.1 fsspec : 2022.5.0 gcsfs : None markupsafe : 2.1.1 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 6.0.1 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : 1.4.39 tables : None tabulate : 0.8.10 xarray : None xlrd : None xlwt : None zstandard : None
Pandas in Ubuntu/Bash
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.10.102.1-microsoft-standard-WSL2 Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 63.2.0 pip : 22.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.4.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
looks like could be Python version. I’m seeing “need to escape, but no escapechar set” with python 3.10 but not on python 3.9 or 3.8 using the same version of pandas, either 1.3.5 or 1.4.3
I had the same issue. I tried a lot of ways to fix it:
None of the above worked!
What worked was changing Python version from 3.10 to 3.9.