question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault in C++ library whilst deserializing messages

See original GitHub issue

I am using the go client, which is a wrapper around the C++ client. I encountered a bug where all my consumers on a namespace suddenly died and could not be restarted. It seems that they are segfaulting whilst trying to deserialize a batch of messages from the broker.

The gdb backtrace looks like this:-

Thread 13 "redacted" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc2f64700 (LWP 6053)]
pulsar::SharedBuffer::readUnsignedInt (this=0x7fff8c006c08) at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/lib/SharedBuffer.h:95
95	        uint32_t value = ntohl(*(uint32_t*)data());
(gdb) bt
#0  pulsar::SharedBuffer::readUnsignedInt (this=0x7fff8c006c08) at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/lib/SharedBuffer.h:95
#1  pulsar::Commands::deSerializeSingleMessageInBatch (batchedMessage=..., batchIndex=batchIndex@entry=1)
    at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/lib/Commands.cc:651
#2  0x00007ffff7e3b704 in pulsar::ConsumerImpl::receiveIndividualMessagesFromBatch (this=0x7fff8c033bc0, 
    cnx=std::shared_ptr<pulsar::ClientConnection> (use count 4, weak count 5) = {...}, batchedMessage=..., redeliveryCount=0)
    at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/lib/ConsumerImpl.cc:372
#3  0x00007ffff7e3bde1 in pulsar::ConsumerImpl::messageReceived (this=this@entry=0x7fff8c033bc0, 
    cnx=std::shared_ptr<pulsar::ClientConnection> (use count 4, weak count 5) = {...}, msg=..., isChecksumValid=@0x7fffc2f62cac: true, metadata=..., payload=...)
    at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/generated/lib/PulsarApi.pb.h:17227
#4  0x00007ffff7dcf936 in pulsar::ClientConnection::handleIncomingMessage (this=0x7fff8c0c11d0, msg=..., isChecksumValid=<optimised out>, msgMetadata=..., payload=...)
    at /usr/include/c++/9/bits/shared_ptr_base.h:1192
#5  0x00007ffff7ddd258 in pulsar::ClientConnection::processIncomingBuffer (this=0x7fff8c0c11d0)
    at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/generated/lib/PulsarApi.pb.h:22533
#6  0x00007ffff7dddccc in pulsar::ClientConnection::handleRead (this=0x7fff8c0c11d0, err=..., bytesTransferred=<optimised out>, minReadSize=4)
    at /home/ben/src/github.com/bschofield/pulsar/pulsar-client-cpp/lib/ClientConnection.cc:489

[...other stack frames deleted...]

The messages are produced by v2.5.0 of the go client (i.e. wrapped C++), on Alpine Linux (musl). The messsages are batched into groups of max 1000, with LZ4 compression.

I see this bug in the consumer using both v2.5.0 and the latest master. The backtrace is taken from a machine running Ubuntu.

Any ideas what might be causing this?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
bschofieldcommented, Apr 24, 2020

You’re right, I should only change one thing at a time, but this is my “production” cluster and I need it to be up so was a little desparate! 😄

The version of LZ4 used in the current C++ client is 1.7.1 which is from back in 2015/2016 (check pulsar-client-cpp/lib/lz4, line 50). There have been a couple of data corruption issues discovered and fixed in LZ4 since then.

@sijie maybe you want to have someone upgrade the lz4.c/lz4.h files in the C++ client to the latest version?

0reactions
tisonkuncommented, Nov 10, 2022

Closed as stale. It seems the original issue has been resolved or answered.

The development of the C++ client has been permanently moved to https://github.com/apache/pulsar-client-cpp. If it’s still relevant, please open an issue there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Shared library g++ error segmentation fault - Stack Overflow
You detect that a library isn't loaded, print an error message, then continue to use the non-existent library like there's no tomorrow. What...
Read more >
command failed due to signal:Segmentation fault 11
The segmentation fault is a compiler crash, and you should report it using the bug reporter. I suggest you concentrate on one of...
Read more >
Core Dump (Segmentation fault) in C/C++ - GeeksforGeeks
When a piece of code tries to do read and write operation in a read only location in memory or freed block of...
Read more >
Segmentation Fault on Linux C .so bindings : r/csharp - Reddit
Hi All! I am trying to call functions from a shared C library to interact with WireGuard on Linux. I have complied the...
Read more >
Re: [capnproto] Segmentation fault while deserializing blob data
-- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found