question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crash caused by SIGSEGV in unpack_byte_array

See original GitHub issue

What happened: Reading a parquet file to a pandas dataframe, fastparquet crashes due to an error in unpack_byte_array. I cannot share the parquet file due to PII. It has been created by AWS DMS.

Dump from parquet-tools - column name is ‘request’:

request:         OPTIONAL BINARY L:STRING R:0 D:1
--
request:          BINARY UNCOMPRESSED DO:0 FPO:1238 SZ:358286/358286/1.00 VC:10 ENC:RLE,PLAIN ST:[no stats for this column]

Ultimately this ends up with a call to:

read_plain(io_obj.read(), type_=6, count=10, width=None, utf=True, stat=False)

With a 283180 length byte array.

What you expected to happen: No crashes on reading these parquet files.

Minimal Complete Verifiable Example:

from fastparquet.encoding import read_plain
from fastparquet.cencoding import NumpyIO

def test_read_plain():
    # raw_bytes = b'a' * 283180 # will also cause a SIGSEGV
    raw_bytes = b'iJwY'

    io_obj = NumpyIO(raw_bytes)
    
    read_plain(io_obj.read(), type_=6, count=1, width=None, utf=True, stat=False)

Run this and you will get a stack dump:

tests/unit/test_fastparquet.py::test_read_plain Fatal Python error: Segmentation fault

Current thread 0x000000011abb4e00 (most recent call first):
  File "/Users/nw/.virtualenvs/bp-lambdas-2/lib/python3.8/site-packages/fastparquet/encoding.py", line 41 in read_plain
  File "/Users/nw/dev/src/backup-pipeline/transformer-v2/tests/unit/test_fastparquet.py", line 11 in test_read_plain
  <snip>
fish: Job 1, 'pytest --pdb -vv tests/unit/tes…' terminated by signal SIGSEGV (Address boundary error)

Python version: Python 3.8.11 Operating System: Mac / AWS Lambda Lib Versions: fastparquet==0.7.1, thrift==0.13.0, numpy==1.21.2

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Aug 19, 2021

(also, your versions of OS, python, thrift and numpy, if you wouldn’t mind)

1reaction
martindurantcommented, Aug 19, 2021

Thanks for the extra information, it looks like it may be helpful, but will take a little time for me to digest.

Are you using fastparquet 0.7.1 ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What causes a SIGSEGV - segmentation fault - Stack Overflow
There are various causes of segmentation faults, but fundamentally, you are accessing memory incorrectly. This could be caused by ...
Read more >
Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >
Segmentation Fault in Linux Containers (exit code 139)
In this post you'll learn about the SIGSEGV error, and how to debug it when ... a segmentation error caused the application inside...
Read more >
SIGSEGV Segmentation Fault JVM Crash | Confluence
The SIGSEGV message indicates Java itself is crashing. Cause. This is usually caused by a bug in the JVM, but in some cases,...
Read more >
Why is there a "V" in SIGSEGV Segmentation Fault?
My program received a SIGSEGV signal and crashed with "Segmentation Fault" ... Accessing data over this limit caused a processor fault.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found