question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] An integrated solution for pulsar Schema and Flink

See original GitHub issue

Is your feature request related to a problem? Please describe. Since the Pulsar connector did not adhere to the Flink format standard, this resulted in the maintenance of schema conversions, data structure conversions, etc. This part of the code was very difficult for me to maintain, and it was difficult for new people to participate in this part of the work. The FlinkPulsarRowSink and FlinkPulsarRowSource feature implementations.

Another important reason is due to the lack of Sink for Format, which may affect the progress of the merge back into the community.

Describe the solution you’d like

  • plan 2: Flink Avro, JSON Format are all standard formats that can be easily integrated with Pulsar. Let the user freely use Flink Avro, JSON, etc. format, and we generate Pulsar Schema information when the user uses Avro, JSON, etc.
    When new serialization methods are supported in Pulsar, we can easily add or use the flink format.

Describe alternatives you’ve considered

  • plan 1: Create a pulsar format and give the user the option to use it. We need to maintain the conversion relationship between Flink data types and Pulsar data types, which is not easy.

Additional context

Pulsar has its own Schema, which supports serialization, deserialization with POJO data. Flink also has a Format function that supports serialization and deserialization of data.

When using the Flink format, the data written to or read from Pulsar is bytes, we can use AutoProduceBytesSchema to produce the message, which allows us to wrap the real Schema information in bytes and will check that the bytes data is properly scrubbed.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jianyun8023commented, Nov 23, 2020

Referring to the FLIP-107 and Kafka implementations, we will implement an adaptive serialization and deserialization interface, which will contain the serialization components for external inputs, Pulsar Schema generation. The responsibilities in the Sink interface will be further simplified and the redundant implementation classes will be removed.

The responsibilities of the components in the current Pulsar connector are not clear. So many implementations of FlinkPulsarRowSource, ReaderThread, RowReaderThread, PulsarFetcher, RowReaderThread, etc. The Flink Row type is enabled and replaced by the RowData type. The irrational implementation of the Pulsar serialization and deserialization functions is the cause of the confusion.

0reactions
jianyun8023commented, Nov 24, 2020

Final Solution: Wrap the Flink format instance as an implementation of the Pulsar Schema interface. This satisfies the Flink format specification and meets Pulsar’s expectations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

When Flink & Pulsar Come Together - Apache Flink
Pulsar can integrate with Apache Flink in different ways. Some potential integrations include providing support for streaming workloads with the ...
Read more >
Batch and Stream Integration of Flink and Pulsar
This article introduces Apache Pulsar, a next-gen cloud-native message streaming platform, and discusses how it enables batch and stream ...
Read more >
Stateful Streams with Apache Pulsar and Apache Flink
We will use Apache Pulsar as our streaming storage layer. Apache Pulsar and Apache Flink have a strong integration together and enable a...
Read more >
Pulsar Flink Connector (Deprecated)
Pulsar Flink Connector is an integration of Apache Pulsar and Apache Flink (data processing engine), which allows Flink to read data from Pulsar...
Read more >
Integrating Apache Pulsar and Apache Flink for Powerful Data ...
Written form:https://blog.rockthejvm.com/ pulsar - flink /GitHub:https://github.com/polyzos/ pulsar - flink -stateful-streamsApache Flink course: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found