question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with compression in to_csv method

See original GitHub issue

Problem description

Hi there,

after upgrading to the lastest version of pandas I have an issue with the code, that worked fine on the previous version (0.22.0):

            df.to_csv(
                path_or_buf=csv_path,
                encoding='utf8',
                compression='gz',
                quoting=1,
                sep='\t',
                index=False)

With pandas 0.23.0 I get:

Traceback (most recent call last): File “C:_script.py”, line 74, in <module> index=False) File “C:\Anaconda3\lib\site-packages\pandas\core\frame.py”, line 1745, in to_csv formatter.save() File “C:\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py”, line 158, in save data = f.read() File “C:\Anaconda3\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x90 in position 298: character maps to <undefined>

If I comment compression=‘gz’ the code works fine.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 2012ServerR2 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: 2.7.4 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, Jun 5, 2018

@nvm1 this should be fixed on master via the referenced PR

1reaction
mingglicommented, Jun 2, 2018

did some digging, it might be different default encoding which open() falls to when not specified. then on windows it tried to decode with CP1252 when your file is UTF-8 encoded.

Read more comments on GitHub >

github_iconTop Results From Across the Web

When using pandas dataframe.to_csv(), with compression ...
When using pandas dataframe. to_csv(), with compression='zip', it creates a zip file with two archive files with the EXACT same name - Stack ......
Read more >
pandas.DataFrame.to_csv
For on-the-fly compression of the output data. If 'infer' and 'path_or_buf' is path-like, then detect compression from the following extensions: '.gz', '.bz2' ...
Read more >
Comparison of Pandas Compression Modes - More Data
Comparing performance of compression modes available in Pandas' to_csv and read_csv methods, in terms of space savings, write times and read ...
Read more >
Accessing zip compression options in pandas to_csv-pandas
I am having trouble finding the compression options available to me. ... compression_opts = dict(method='zip', archive_name='out.csv').
Read more >
Pandas DataFrame: to_csv() function - w3resource
Name Description Type / Default Value Required / Op... na_rep Missing data representation. str. Default Value: '' Required float_format Format string for floating point numbers....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found