Issue with compression in to_csv method
See original GitHub issueProblem description
Hi there,
after upgrading to the lastest version of pandas I have an issue with the code, that worked fine on the previous version (0.22.0):
df.to_csv(
path_or_buf=csv_path,
encoding='utf8',
compression='gz',
quoting=1,
sep='\t',
index=False)
With pandas 0.23.0 I get:
Traceback (most recent call last): File “C:_script.py”, line 74, in <module> index=False) File “C:\Anaconda3\lib\site-packages\pandas\core\frame.py”, line 1745, in to_csv formatter.save() File “C:\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py”, line 158, in save data = f.read() File “C:\Anaconda3\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x90 in position 298: character maps to <undefined>
If I comment compression=‘gz’ the code works fine.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 2012ServerR2 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
@nvm1 this should be fixed on master via the referenced PR
did some digging, it might be different default encoding which
open()
falls to when not specified. then on windows it tried to decode with CP1252 when your file is UTF-8 encoded.