Support for retrieving incomplete string/bytes from an InputStream when EOF is reached
See original GitHub issueI have an InputStream which may have some additional characters at the end (incomplete json object string) which I want to retrieve after reading a bunch of json values.
To elaborate: I have an InputStream for reading a large file, but due to throttling reasons, the InputStream sends and EOF after a pre-defined number of bytes are transferred. In other words, I have a large file, which lets me open InputStreams for reading 100MB or data at a time. When the 100MB limit is reached, the stream ends, and you need to create a new InputStream for the next “block” / “page”.
Due to this model, the stream may end with a json object split across 2 blocks. I need to be able to handle this while reading.
I was referring to #1304 which helped me solve how to read values from an input stream.
Here’s a small piece of code as an example
var json = " {\"a\":{\"b\":[{\"c\":1}\n]}} { \"a\" : 1}\n{ \"b\" : 1} {} {\"x";
var input = new ByteArrayInputStream(json.getBytes());
var reader = Util.getObjectMapper().readerFor(Object.class);
var iterator = reader.readValues(input);
Iterable<Object> iterable = () -> iterator;
try {
for (var x : iterable) {
print(x);
}
} catch (Exception e) {
e.printStackTrace(System.out);
}
print("Left: " + new String(input.readAllBytes()));
Here’s the output of the above:
{a={b=[{c=1}]}}
{a=1}
{b=1}
{}
java.lang.RuntimeException: Unexpected end-of-input in field name
at [Source: (ByteArrayInputStream); line: 3, column: 18]
at com.fasterxml.jackson.databind.MappingIterator._handleIOException(MappingIterator.java:417)
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:203)
at Scratch.main(scratch.java:24)
Caused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input in field name
at [Source: (ByteArrayInputStream); line: 3, column: 18]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:662)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.parseEscapedName(UTF8StreamJsonParser.java:2020)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.slowParseName(UTF8StreamJsonParser.java:1925)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseName(UTF8StreamJsonParser.java:1709)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:766)
at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:707)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:280)
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:199)
... 1 more
Left:
As can be seen, the entire stream has been consumed. Is there any buffer that Jackson stores this in to be able to retrieve from ?
I’m trying to retrieve the unresolved part of the InputStream and prefix it to the next one (using something like SequenceInputStream).
Any way I can do this ?
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
While there is on-going work, I think this specific issue turned out to be bit of misunderstanding between what can be retrieved (non-consumed content) and what would be expected (all content for which token has not been produced, including possible partial token content consumed and indicated in exception message).
Thanks, your concern is very true. Any path that circumvents the counting operations (even unintentionally) will instantly cause failures in the entire logic. This approach is highly dependent on the reliability of said counts.
I will thoroughly test this with our actual use-case to make sure it works. Simultaneously, I’ll continue checking for possibly more reliable alternatives.