Eventhubs extension crashes with segmentation fault, either SIGSEGV or SIGABORT from send_batch()
See original GitHub issue- Package Name: azure-eventhub
- Package Version: 5.2.0. Originally detected at 5.1.0, but also present when upgrading to 5.2.0 (latest at time of writing).
- Operating System: Linux, Debian kernel 4.9.168-1-amd64
- Python Version: 3.5 Possibly related to Issue #9435: https://github.com/Azure/azure-sdk-for-python/issues/9435
Describe the bug
Python program crashes with a segmentation fault (and nothing else) when uploading data to an eventhub. After a fair amount of debugging using gdb and eliminating all other factors (it is not in our code, as we did a complete dry run with everything minus the actual eventhub sending), we find the following error(s) in the eventhubs extension:
output of gdb: *** Error in `/usr/bin/python3’: double free or corruption (fasttop): 0x0000555556ca03a0 ***
signals:
*** Program received signal SIGABT, Aborted.
*** Program received signal SIGSEGV, Segmentation fault.
*** Stack traces: **** Stack trace from gdb (py-bt command, for the SIGABRT error):
(gdb) py-bt
Traceback (most recent call first):
<built-in method send of uamqp.c_uamqp.cMessageSender object at remote 0x7fffef69dc08>
File "/usr/local/lib/python3.5/dist-packages/uamqp/sender.py", line 246, in send
return self._sender.send(c_message, timeout, message)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 605, in _transfer_message
sent = self.message_handler.send(message, self._on_message_sent, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 626, in _filter_pending
self._transfer_message(message, timeout)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 645, in _client_run
self._pending_messages = self._filter_pending()
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 397, in do_work
return self._client_run()
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 756, in wait
running = self.do_work()
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 161, in _send_event_data
self._handler.wait() # type: ignore
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_client_base.py", line 454, in _do_retryable_operation
**kwargs
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 171, in _send_event_data_with_retry
return self._do_retryable_operation(self._send_event_data, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 262, in send
self._send_event_data_with_retry(timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer_client.py", line 245, in send_batch
to_send_batch, timeout=send_timeout
File "/home/user/program/program.py", line 173, in send_batch_of_data
producer.send_batch(event_data_batch)
File "/home/user/program/program.py", line 300, in main
print("Sending all new data...")
File "program_script.py", line 4, in <module>
program.main()
**** Stack trace from gdb (py-bt command, for the SIGSEGV error):
(gdb) py-bt
Traceback (most recent call first):
<built-in method send of uamqp.c_uamqp.cMessageSender object at remote 0x7fffef6c7c48>
File "/usr/local/lib/python3.5/dist-packages/uamqp/sender.py", line 246, in send
return self._sender.send(c_message, timeout, message)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 601, in _transfer_message
sent = self.message_handler.send(message, self._on_message_sent, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 622, in _filter_pending
self._transfer_message(message, timeout)
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 641, in _client_run
self._pending_messages = self._filter_pending()
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 397, in do_work
return self._client_run()
File "/usr/local/lib/python3.5/dist-packages/uamqp/client.py", line 752, in wait
running = self.do_work()
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 161, in _send_event_data
self._handler.wait() # type: ignore
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_client_base.py", line 454, in _do_retryable_operation
**kwargs
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 171, in _send_event_data_with_retry
return self._do_retryable_operation(self._send_event_data, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer.py", line 262, in send
self._send_event_data_with_retry(timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/azure/eventhub/_producer_client.py", line 245, in send_batch
to_send_batch, timeout=send_timeout
File "/home/user/program/program.py", line 156, in send_batch_of_data
producer.send_batch(event_data_batch)
File "/home/user/program/program.py", line 263, in main
latest_id = send_batch_of_data(producer,
File "program_script.py", line 4, in <module>
program.main()
To Reproduce I cannot send you our entire codebase of getting data etc, or the actual data, but, in theory, this should hopefully be sufficient: Steps to reproduce the behavior:
- Get a fair amount of data, around 10-15 million records/rows. In our case, it comes from a database using sqlalchemy. We use the query.yield_per(1000) method to not load that many rows in memory all at once.
- Open an eventhub connection:
producer = EventHubProducerClient.from_connection_string(
conn_str="<connection string here, e.g.: Endpoint=sb://........>",
eventhub_name="<name here>")
- Convert & upload data in batches, in JSON form, trimmed down to essentials:
event_data_batch = producer.create_batch()
for row in data:
json_object = {
"id": row.id,
# And more data stuff here of course, in our case about 10 more basic values, nothing fancy
}
json_string = json.dumps(json_object, indent=4, sort_keys=True)
event_data = EventData(json_string)
try:
event_data_batch.add(event_data)
except Exception as e:
# Reached max data batch size, send it and create a new one
# segfault/sigabort will occur on the next line, but not consistently... :(
producer.send_batch(event_data_batch)
event_data_batch = producer.create_batch()
event_data_batch.add(event_data)
# And a last send_batch() call here to upload the final batch of data with code pretty much the same as above.
- Get a “segmentation fault” when running the program. It does not always happen at the exact same “time”, but it does happen at the exact same line of code as mentioned in the code comment above in the previous step. So it is independent of the actual data being send. Furthermore, even though the error and/or stack trace indicate a network issue, the program is run on a dedicated VPS, with a gigabit fibre internet connection, so it’s not e.g. a flaky 4G connection or something and should therefore be sufficiently stable.
Expected behavior No segmentation fault and no hard crash, and just upload the data. Or a Python exception would be also fine if something went wrong, but not a hard crash like this. E.g. even PDB isn’t able to gracefully handle it, and also crashes.
Screenshots I can add screenshots, but I think the stack traces and provided info should be sufficient. If not, let me know, I can run GDB etc and/or provide more info if needed. But, I cannot share you our entire codebase or the actual data being sent. The code above is exactly what happens, minus some details irrelevant to the bug.
Additional context It does not happen consistently at the exact same time (e.g. after X amount of data being sent), but it does happen at the exact same line, eventually, with either of the 2 signals being fired: SIGSEGV or SIGABRT. Given the stack trace it could be a network issue, but as said, it’s run on a dedicated VPS. The connection string should be correct (some data is being sent and received before the crash), so I would have expected an error of that at the first call of send_batch(), or the eventhubs connect, rather than a random amount of calls later. Also, the python program does not do multiple processes or multiple threads: it’s completely single-threaded. Frankly, I’m at a loss, and I hope this is fixable…
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (5 by maintainers)
hey@MR-KO, thanks for your patience! we have fixed the issue in azure-eventhub 5.4.0. please update to the latest version via
pip install azure-eventhub --upgrade
. (If you’re interested, the root cause lies in uamqp, and analysis could be found here: https://github.com/Azure/azure-uamqp-python/pull/217#issue-595648009)I’m closing this now, feel free to reopen if you’re still encountering the issue, thanks!
Hi @yunhaoling, no problem, I know how the job goes as a fellow software engineer 😃. I am already glad of the ongoing effort. Great to hear you can reproduce it. I took your code, added my config/login details, and also got a stack trace etc as you’d expect. There are a few things of interest to note:
stack trace:
So it seems very likely that this is indeed the culprit!