Invalid Avro file produced using SequenceWriter
See original GitHub issueWhile documentation on writing Avro to a file is sparse, I have managed to piece some stuff together but I am still getting an error.
- https://groups.google.com/g/jackson-user/c/yOXaJvAMzfg
- https://github.com/FasterXML/jackson-dataformats-binary/issues/35
- https://github.com/FasterXML/jackson-dataformats-binary/issues/15
Here is some sample code:
final var avroFactory = AvroFactory.builderWithApacheDecoder().enable(AvroGenerator.Feature.AVRO_FILE_OUTPUT).build();
final var generator = new AvroSchemaGenerator().enableLogicalTypes();
final var mapper = AvroMapper.builder(avroFactory).addModule(new AvroJavaTimeModule()).build();
mapper.acceptJsonFormatVisitor(Thing.class, generator);
final var avroSchema = generator.getGeneratedSchema();
final var file = Files.createTempFile("something", ".avro").toFile();
final var out = new ByteArrayOutputStream();
final var writer = mapper.writer(avroSchema).writeValues(out);
// in a loop
writer.write(thing);
// after loop
writer.close();
try (FileOutputStream outputStream = new FileOutputStream(file)) {
out.writeTo(outputStream);
}
When checking the resultant file using avro-tools
, I get the following error:
avro-tools tojson something.avro
22/09/08 18:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:224)
at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:97)
at org.apache.avro.tool.Main.run(Main.java:67)
at org.apache.avro.tool.Main.main(Main.java:56)
Caused by: java.io.IOException: Invalid sync!
at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:319)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:213)
... 3 mor
According to some searching, the Invalid sync!
error occurs when the file hasn’t been stitched together properly, but it’s unclear to me what I need to do in code to get that to happen. I’ve looked through most of the Avro tests in this repo and I cannot find one that actually writes to a file and then de-serializes from that file.
I am not sure if I have stumbled into an actual bug here or not, but I am happy to try and write a test case if this code does seem correct since that would imply it’s a bug?
Thanks in advance.
Edit:
I’ve also tried the following:
final var file = Files.createTempFile("something", ".avro").toFile();
final SequenceWriter writer = mapper.writer(avroSchema).writeValues(file);
In which case I get the following error at that line:
java.lang.UnsupportedOperationException: Generator of type com.fasterxml.jackson.core.json.UTF8JsonGenerator does not support schema of type 'avro'
at com.fasterxml.jackson.core.JsonGenerator.setSchema(JsonGenerator.java:592)
at com.fasterxml.jackson.databind.ObjectWriter$GeneratorSettings.initialize(ObjectWriter.java:1393)
at com.fasterxml.jackson.databind.ObjectWriter._configureGenerator(ObjectWriter.java:1258)
at com.fasterxml.jackson.databind.ObjectWriter.createGenerator(ObjectWriter.java:717)
at com.fasterxml.jackson.databind.ObjectWriter.writeValues(ObjectWriter.java:753)
Issue Analytics
- State:
- Created a year ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
Oh. The part that possibly (likely?) will not work is the use of
writeValues()
(andSequenceWriter
it creates) – I suspect you cannot simply append root-level values in Avro, unlike in some other formats. So you may need to instead create a container (List
) with matching root-level Avro type to describe the full type. But then again… Avro is designed for data streams so I am not 100% sure (it has been a while since I worked actively on this format module).I’ll try and add a test case this weekend.
Does the code I provided at least seem like it should work? I am curious if we can minimize the reproduction even further.