Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

documentation: unclear how to use Pulsar IO

See original GitHub issue

Couldn’t figure out from documentation how to use Pulsar IO connectors ((

https://pulsar.apache.org/docs/en/io-managing/#configuring-connectors - here you tell to use yaml config files and give link to run connectors but in run connectors section - not once yaml configs were mentioned

Also, I couldn’t find any information on how to configure SerDe for Pulsar IO? For example if my application uses protobuf schemas to create messages for Pulsar topic then how available Pulsar IO’s sink connectors can read these messages, deserealize them and convert them to format in which the sink expects them to be?
I could only think about managing required ETL using Pulsar Functions: application writes protobuf messages, Pulsar Functions read them and converts to format which the sink expects and writes these messages to another topic and only then Pulsar IO reads these messages.
But requirement to write transformed messages to new topic for each (topic, sink) pair - seems overkill.

P.s. I would like to use external protobuf schemas: In our company, we store all of our schemas from all services in separate repository. For given service we compile needed schemas with required language and place resulted stubs into separate repo which is then used by the target service (leveraging git submodules) So it would be nice to be able to provide compiled protobuf stubs to Pulsar IO SerDe logic. From what I understand this repo provides this functional for Kafka Connect but I would love to use Pulsar instead of Kafka.

Issue Analytics

State:
Created 4 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

2reactions

sijiecommented, May 5, 2019

Couldn’t figure out from documentation how to use Pulsar IO connectors ((

Thank you for your feedbacks. This helps us improve the documentation. We will try to make clarifications in the documentation according to your feedbacks.

I couldn’t find any information on how to configure SerDe for Pulsar IO?

Pulsar IO is more about getting data in and out. Most of the connectors just transferred bytes. Some of the connectors like CDC and JDBC will attempt to deserialize the events using pulsar generic schema.

Currently Pulsar IO doesn’t provide the ability to run functions along with the connectors. There are issues created for looking into that space.

I would like to use external protobuf schemas:

You can construct a schema instance using Schema.PROTOBUF(<generated-protobuf-class>.class). Then you can use the schema instance in your applications to publish the protobuf messages to a pulsar topic.

Assume your generated protobuf class is ProtobufClass. Then you can write functions as followings:

public class TestFunction implements Function<ProtobufClass, Void> {
}

when you submit such a function, you can specify --schema-type PROTOBUF. If you want to use your own SerDe, you can specify --custom-serde-inputs.

1reaction

sijiecommented, May 26, 2019

is to pull data from Pulsar topic (which consists from protobuf messages) and push it into ElasticSearch. If I understood you correctly, the built-in ElasticSearch connector wan’t be able to deserialize protobuf messages and I should write my own connector for that purpose (probably just modify the existing one).

yes that’s correct. you just need to modify the existing one or add a new one based on the existing one.

is it correct that I can reuse KafkaConnect implementation for that purpose?

yes you can

Top Results From Across the Web

How to use Pulsar connectors

Configure a default storage location for a connector. Configure a connector with a YAML file ; Create a connector. Start a connector. Run...

Apache Pulsar JDBC sink: differentiation between insert ...

Now, it's mentioned that JDBC sinks support insert/update/delete ops, but I wasn't able to find any documentation on HOW the sink connector ...

Explanation — pint 0.9.0+0.g76725a2a.dirty documentation

This document is aimed at using PINT specifically, and may also be more ... https://ipta.github.io/pulsar-clock-corrections/ to retrieve up-to-date clock ...

Mastering Apache Pulsar

with this book, you may use it in your programs and documentation. ... A Pulsar process pulls data from MySQL, and a Pulsar...

Event Streaming with Apache Pulsar and Scala

These two terms are ofter confused to be the same, but there are fundamental differences. For example in a messaging use case you...