fastparquet v 0.6.1 crashes on read
See original GitHub issueWhat happened:
Loading a 32Mb parquet file crashes with a core dump. It takes up 3Gb of memory before crashing.
What you expected to happen:
Prior to this version (unsure if 0.5.0 or 0.6.0) loading the same file worked with no problem. Tested on 0.5.0 without issues. Will test on 0.6.0 also.
Minimal Complete Verifiable Example:
Sorry I can’t provide the file as it is private data.
pd.read_parquet(filename)
Anything else we need to know?:
This issue appeared today (2021-05-12) so we assume it is related to release 0.6.1. Switching to pyarrow
fixed the problem immediately. Previously, pyarrow
was not installed and fastparquet
was always used. Our Jupyter environments are ephemeral and torn down every day, so we reinstall new versions daily for work. Thus we assume 0.6.1 introduced the bug.
Environment:
- Dask version: ? 0.6.1 fastparquet
- Python version: 3.8.5
- Operating System: Ubuntu 20.04.1 LTS (Focal Fossa) on AWS
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (8 by maintainers)
Top GitHub Comments
Note that you helped me to identify a speedup of ~15% for UTF8 string reading, so I’m almost glad this bug was there.
OK thank you - I should be able to work with that