documentation: unclear how to use Pulsar IO
See original GitHub issueCouldn’t figure out from documentation how to use Pulsar IO connectors ((
https://pulsar.apache.org/docs/en/io-managing/#configuring-connectors - here you tell to use yaml config files and give link to run connectors but in run connectors
section - not once yaml configs were mentioned
Also, I couldn’t find any information on how to configure SerDe for Pulsar IO? For example if my application uses protobuf schemas to create messages for Pulsar topic then how available Pulsar IO’s sink connectors can read these messages, deserealize them and convert them to format in which the sink expects them to be?
I could only think about managing required ETL using Pulsar Functions: application writes protobuf messages, Pulsar Functions read them and converts to format which the sink expects and writes these messages to another topic and only then Pulsar IO reads these messages.
But requirement to write transformed messages to new topic for each (topic, sink) pair - seems overkill.
P.s. I would like to use external protobuf schemas: In our company, we store all of our schemas from all services in separate repository. For given service we compile needed schemas with required language and place resulted stubs into separate repo which is then used by the target service (leveraging git submodules) So it would be nice to be able to provide compiled protobuf stubs to Pulsar IO SerDe logic. From what I understand this repo provides this functional for Kafka Connect but I would love to use Pulsar instead of Kafka.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Thank you for your feedbacks. This helps us improve the documentation. We will try to make clarifications in the documentation according to your feedbacks.
Pulsar IO is more about getting data in and out. Most of the connectors just transferred
bytes
. Some of the connectors like CDC and JDBC will attempt to deserialize the events using pulsar generic schema.Currently Pulsar IO doesn’t provide the ability to run
functions
along with the connectors. There are issues created for looking into that space.You can construct a schema instance using
Schema.PROTOBUF(<generated-protobuf-class>.class)
. Then you can use the schema instance in your applications to publish the protobuf messages to a pulsar topic.Assume your generated protobuf class is
ProtobufClass
. Then you can write functions as followings:when you submit such a function, you can specify
--schema-type PROTOBUF
. If you want to use your own SerDe, you can specify--custom-serde-inputs
.yes that’s correct. you just need to modify the existing one or add a new one based on the existing one.
yes you can