Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with compression in to_csv method

See original GitHub issue

Problem description

Hi there,

after upgrading to the lastest version of pandas I have an issue with the code, that worked fine on the previous version (0.22.0):

            df.to_csv(
                path_or_buf=csv_path,
                encoding='utf8',
                compression='gz',
                quoting=1,
                sep='\t',
                index=False)

With pandas 0.23.0 I get:

Traceback (most recent call last): File “C:_script.py”, line 74, in <module> index=False) File “C:\Anaconda3\lib\site-packages\pandas\core\frame.py”, line 1745, in to_csv formatter.save() File “C:\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py”, line 158, in save data = f.read() File “C:\Anaconda3\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x90 in position 298: character maps to <undefined>

If I comment compression=‘gz’ the code works fine.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 2012ServerR2 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, Jun 5, 2018

@nvm1 this should be fixed on master via the referenced PR

1reaction

mingglicommented, Jun 2, 2018

did some digging, it might be different default encoding which open() falls to when not specified. then on windows it tried to decode with CP1252 when your file is UTF-8 encoded.

Top Results From Across the Web

When using pandas dataframe.to_csv(), with compression ...

When using pandas dataframe. to_csv(), with compression='zip', it creates a zip file with two archive files with the EXACT same name - Stack ......

pandas.DataFrame.to_csv

For on-the-fly compression of the output data. If 'infer' and 'path_or_buf' is path-like, then detect compression from the following extensions: '.gz', '.bz2' ...

Comparison of Pandas Compression Modes - More Data

Comparing performance of compression modes available in Pandas' to_csv and read_csv methods, in terms of space savings, write times and read ...

Accessing zip compression options in pandas to_csv-pandas

I am having trouble finding the compression options available to me. ... compression_opts = dict(method='zip', archive_name='out.csv').

Pandas DataFrame: to_csv() function - w3resource

Name Description Type / Default Value Required / Op... na_rep Missing data representation. str. Default Value: '' Required float_format Format string for floating point numbers....