Breaking: Consumer always faults randomly on high throughput with ReadOnlySequence<byte> exceptions
See original GitHub issueLet me start by saying that IMHO the project looks great, uses most of the best and latest features of .NET Core, and code looks nice. So congrats @blankensteiner and all other contributors, on a most excellent start!
Under high throughput the Consumer always faults randomly (sometimes sooner sometimes later) but always 100% of the times and while trying to read a ReadOnlySequence<T>
(MacOs + SDK 3.1.101)
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'start')
at DotPulsar.Internal.Extensions.ReadOnlySequenceExtensions.StartsWith[T](ReadOnlySequence`1 sequence, ReadOnlyMemory`1 target)
While stress testing I have seen it happen in these areas of the code but only when consuming:
It is reproducible by simply using the Samples solution, running the Producer to produce a large number of messages - say 50K, and starting the Consumer.
I tried to fix everything by attempting several solutions without success:
- Using
SequenceReader<byte>
- Using
sequence.TryGet(ref position, out var memory)
in a while loop
Researching .Net issues I did find a lot of potentially related issues, some already fixed, but not available until the next releases of the framework:
- Possible race condition in System.IO.Pipelines: InvalidCastException - exactly the same error
- ReadOnlySequence<T> seems to hand out incorrect position - this one also reports failure while stress testing
- PipeReader.CopyToAsync(destination) calls AdvanceTo(default) when destination.WriteAsync throws
- SequenceReader nextPosition fix
Potential solutions:
- Retrofit these fixes until it works and release a new version.
- Don’t use Pipes and go old school.
- Use Pipes but without exposing the
ReadOnlySequence
but instead streams, spans, memory and arrays, whatever is necessary.
I would happily contribute with a fix if I can indeed fix it, because this is a show stopper for me, my team and our new solution design.
Currently I’m going deeper and up the call stack (PulsarStream
, Connector
,ConsumeChannel
, etc…) trying to figure out if are corrupting the memory or have a memory leak.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:15 (15 by maintainers)
Top GitHub Comments
I’ll try to get a PR in later today for you to check it out.
I think we’re seeing exactly the same on our end, I’m trying to get a reliable(ish) reproduction ready, with access to a Pulsar instance. In contact with @blankensteiner about this as well. Your finds have been super useful. Do you reckon these fixes in the patch release could be it @RagingKore ? Seems a 3.1.2 release got out that I am not using yet (https://dotnet.microsoft.com/download/dotnet-core)