Flink Iceberg Usage
See original GitHub issueWe use Avro schema as unified ETL schema management solution. when I’m trying to write data into Iceberg using Flink, I found there are so many terms in Flink to represent data types, such as TypeInformation
, LogicalType
, RowType
, TableSchema
, DataType
… I can’t figure out the relationship between them and how to convert each other.
Specifically, my question is How can I write DataStream<GenericRecord>
to an Iceberg table using Flink Iceberg api? And I think avro Schema
should have enough information to desc the Record schema.
Should I use below APIs? If yes, how can I adapt them from DataStream<GenericRecord>
?
public static <T> Builder builderFor(DataStream<T> input,
MapFunction<T, RowData> mapper,
TypeInformation<RowData> outputType)
public static Builder forRow(DataStream<Row> input, TableSchema tableSchema)
PS: It’s GenericRecord
in Avro, not Iceberg.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Enabling Iceberg in Flink
Iceberg provides API to rewrite small files into large files by submitting flink batch job. The behavior of this flink action is the...
Read more >Flink + Iceberg: How to Construct a Whole-scenario Real-time ...
Flink real-time tasks often run in clusters on a long-term basis. Usually, the Iceberg commit is set to perform a commit operation every...
Read more >Flink via Iceberg - Project Nessie
Detailed steps on how to set up Pyspark + Iceberg + Flink + Nessie with Python is available on Binder. In order to...
Read more >how to consume historical iceberg data with flink? · Issue #3905
The incremental consumption of flick can meet your requirements. The consumption of overwrite snapshot is not supported now. Detailed guidance ...
Read more >Real-Time Data Lake Based on Apache Flink ... - Alibaba Cloud
The following briefly explains the design principle of the Flink Iceberg Sink. Iceberg uses the optimistic lock method to commit a transaction.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think I found the solution:
@openinx Thanks for following up on this issue, I haven’t test nested field, but found another issue about Logical Types. Our ETL build on CDH-6.3.1 with Avro 1.8.2, which not support generate Java 8 time because of AVRO-2079, and without
converter
, Flink can’t handle joda time properly.We really need that
converter
. All of our ETL job input and output data structures are present by AvroGenericRecord
because it hasavrc
to define schema in Json, and provideavro-maven-plugin
to generate Entity automatically.PS: We only use avro to manage schemas, but store in ORC. And we are trying to migrate storage format to Iceberg. If iceberg provide schema management tools like avro, may be we can also manage schema by Iceberg.