question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Invalid Avro file produced using SequenceWriter

See original GitHub issue

While documentation on writing Avro to a file is sparse, I have managed to piece some stuff together but I am still getting an error.

Here is some sample code:

final var avroFactory = AvroFactory.builderWithApacheDecoder().enable(AvroGenerator.Feature.AVRO_FILE_OUTPUT).build();

final var generator = new AvroSchemaGenerator().enableLogicalTypes();

final var mapper = AvroMapper.builder(avroFactory).addModule(new AvroJavaTimeModule()).build();
mapper.acceptJsonFormatVisitor(Thing.class, generator);

final var avroSchema = generator.getGeneratedSchema();

final var file = Files.createTempFile("something", ".avro").toFile();

final var out = new ByteArrayOutputStream();
final var writer = mapper.writer(avroSchema).writeValues(out);

// in a loop
writer.write(thing);

// after loop
writer.close();

try (FileOutputStream outputStream = new FileOutputStream(file)) {
  out.writeTo(outputStream);
}

When checking the resultant file using avro-tools, I get the following error:

avro-tools tojson something.avro

22/09/08 18:36:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:224)
	at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:97)
	at org.apache.avro.tool.Main.run(Main.java:67)
	at org.apache.avro.tool.Main.main(Main.java:56)
Caused by: java.io.IOException: Invalid sync!
	at org.apache.avro.file.DataFileStream.nextRawBlock(DataFileStream.java:319)
	at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:213)
	... 3 mor

According to some searching, the Invalid sync! error occurs when the file hasn’t been stitched together properly, but it’s unclear to me what I need to do in code to get that to happen. I’ve looked through most of the Avro tests in this repo and I cannot find one that actually writes to a file and then de-serializes from that file.

I am not sure if I have stumbled into an actual bug here or not, but I am happy to try and write a test case if this code does seem correct since that would imply it’s a bug?

Thanks in advance.

Edit:

I’ve also tried the following:

final var file = Files.createTempFile("something", ".avro").toFile();
final SequenceWriter writer = mapper.writer(avroSchema).writeValues(file);

In which case I get the following error at that line:

java.lang.UnsupportedOperationException: Generator of type com.fasterxml.jackson.core.json.UTF8JsonGenerator does not support schema of type 'avro'

	at com.fasterxml.jackson.core.JsonGenerator.setSchema(JsonGenerator.java:592)
	at com.fasterxml.jackson.databind.ObjectWriter$GeneratorSettings.initialize(ObjectWriter.java:1393)
	at com.fasterxml.jackson.databind.ObjectWriter._configureGenerator(ObjectWriter.java:1258)
	at com.fasterxml.jackson.databind.ObjectWriter.createGenerator(ObjectWriter.java:717)
	at com.fasterxml.jackson.databind.ObjectWriter.writeValues(ObjectWriter.java:753)

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
cowtowncodercommented, Sep 9, 2022

Oh. The part that possibly (likely?) will not work is the use of writeValues() (and SequenceWriter it creates) – I suspect you cannot simply append root-level values in Avro, unlike in some other formats. So you may need to instead create a container (List) with matching root-level Avro type to describe the full type. But then again… Avro is designed for data streams so I am not 100% sure (it has been a while since I worked actively on this format module).

0reactions
willsotocommented, Sep 9, 2022

I’ll try and add a test case this weekend.

Does the code I provided at least seem like it should work? I am curious if we can minimize the reproduction even further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to generate CSV from AVRO file in java
Instead, you need to construct a SequenceWriter if you want to write a sequence of rows separately. Something like:
Read more >
Using Apache Avro Data Files with CDH | 6.3.x
The Avro MapReduce API is an Avro module for running MapReduce programs that produce or consume Avro data files. If you are using...
Read more >
Invalid namespace importing Avro files generated ...
An error occurs when importing Avro files into BigQuery. The files were generated using a ProtobufData schema. The error shown is similar ...
Read more >
Using the Metadata Providers - GoldenGate
The Avro Metadata Provider is used to retrieve the table metadata from Avro Schema files. ... 24.2.8.3 Invalid Namespace in Schema File.
Read more >
Hive connector — Trino 403 Documentation
Avro. RCText (RCFile using ColumnarSerDe ). RCBinary (RCFile using LazyBinaryColumnarSerDe ). SequenceFile. JSON (using org.apache.hive.hcatalog.data.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found