Reader group initialization should respect truncation point
See original GitHub issueProblem description When a reader group is created (or reset) with a configuration not containing a start position, the start position is implicitly set to the head of the stream. Issue: the underlying calculation is not respecting segment truncation and simply assumes an offset of zero for the available segments. In reality the offset may be non-zero.
The issue causes a truncation exception, such as seen below (following a reset of the RG with name hulksmallScaleforgetful0
). Observe that the initial offset of hulk/smallScale/53
is 0
yet the subsequent response of SegmentIsTruncated
makes clear that startOffset
is 1189328
.
2018-05-30 00:36:13,589 10806536 [pool-8-thread-4] INFO c.e.n.h.t.w.readers.PravegaReader - Added Reader HulkSmallScale7d669bdb9f2f4e7fbd9c35969f9ec4f9 To ReaderGroup hulksmallScaleforgetful0
2018-05-30 00:36:13,602 10806549 [pool-8-thread-4] INFO i.p.c.s.impl.EventStreamReaderImpl - EventStreamReaderImpl( id=HulkSmallScale7d669bdb9f2f4e7fbd9c35969f9ec4f9) acquiring segments {hulk/smallScale/76=0, hulk/smallScale/79=0, hulk/smallScale/74=0, hulk/smallScale/70=0, hulk/smallScale/96=0, hulk/smallScale/97=0, hulk/smallScale/98=0, hulk/smallScale/67=0, hulk/smallScale/92=0, hulk/smallScale/93=0, hulk/smallScale/94=0, hulk/smallScale/95=0, hulk/smallScale/88=0, hulk/smallScale/89=0, hulk/smallScale/90=0, hulk/smallScale/58=0, hulk/smallScale/91=0, hulk/smallScale/84=0, hulk/smallScale/53=0, hulk/smallScale/86=0, hulk/smallScale/87=0, hulk/smallScale/81=0, hulk/smallScale/82=0}
2018-05-30 00:36:14,028 10806975 [epollEventLoopGroup-5-4] INFO i.p.c.s.i.AsyncSegmentInputStreamImpl - Received segmentIsTruncated WireCommands.SegmentIsTruncated(type=SEGMENT_IS_TRUNCATED, requestId=0, segment=hulk/smallScale/53, startOffset=1189328)
2018-05-30 00:36:14,028 10806975 [epollEventLoopGroup-5-4] WARN i.p.c.s.i.AsyncSegmentInputStreamImpl - Exception while reading from Segment : hulk/smallScale/53
io.pravega.client.segment.impl.SegmentTruncatedException: null
Problem location
ReaderGroupImpl::getSegmentsForStreams
-> ControllerService::getSegmentsAtTime
Suggestions for an improvement
- Adjust
ControllerService::getSegmentsAtTime
to return thestartOffset
of the respective segment. - undo the duplicative logic that was added to BatchClient in #2551
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (16 by maintainers)
Top GitHub Comments
There is no need to provide offset information to the reader. Invoking the
readNextEvent()
again after receiving theTruncatedDataException
would result in a read of the next available data.The below flow describes the steps that need be performed when resetReaderGroup is invoked.
All existing readers will throw a
ReinitializationException
on performingexistingReader.readNextEvent()
post aresetReaderGroupOperation
. On this exception close the existingReaderexistingReader.close()
and create a new readers.Create a new Reader.
reader.readNextEvent( timeout)
TruncatedDataException
if the reader observes that the Segment it is reading from is truncated.reader.readNextEvent(timeout)
again would read from the next available data. i.e. if no action needs to be taken on receivingTruncatedDataExcecption
then this exception can be ignored and reader.readNextEvent() can be invoked again to fetch the next available data.Throwing the exception should be consistent, otherwise the semantics is broken.
That’s a possibility, but not the only one. In fact, that’s the reason why systems like Kafka have configuration parameters like
auto.offset.reset
.