Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can you read from multiple sources kafka clusters and write to multiple kafka clusters?

See original GitHub issue

Hello,

We are doing a POC for uReplicator to be used across multiple regions/sites. Let me share the configs:

------- Controller Configs ------- 
cat start-controller-with-args.sh 
#!/bin/bash

export hoa_srcCluster='ZK-CLUSTER-1:2181/srcCluster'
export asd_desCluster='ZK-CLUSTER-2/destCluster'
export hoa_zk='ZK-CLUSTER-1:2181'

args="-helixClusterName testMirrorMaker"
args="${args} -destKafkaZkPath ${asd_desCluster}"
args="${args} -srcKafkaZkPath ${hoa_srcCluster}"
args="${args} -zookeeper ${hoa_srcCluster}"
args="${args} -port 10000 -mode auto"
args="${args} -enableAutoWhitelist true"
args="${args} -autoRebalanceDelayInSeconds 120 -backUpToGit false"
args="${args} -localBackupFilePath ~/backup_"
echo " /bin/bash ./start-controller.sh startMirrorMakerController ${args}"
/bin/bash ./start-controller.sh startMirrorMakerController ${args}

------- Worker configs ------- 
cat /app/uReplicator/config/consumer.properties  | grep -vE "^#"
zookeeper.connect=ZK-CLUSTER-1:2181/srcCluster
zookeeper.connection.timeout.ms=30000
zookeeper.session.timeout.ms=30000
group.id=kloak-mirrormaker-test
consumer.id=kloakmms01-sjc1
partition.assignment.strategy=roundrobin
socket.receive.buffer.bytes=1048576
fetch.message.max.bytes=8388608
queued.max.message.chunks=5
auto.offset.reset=smallest


cat /app/uReplicator/config/producer.properties  | grep -vE '^#'
bootstrap.servers=KAFKA-CLUSTER-2:9092
client.id=kloak-mirrormaker-test
producer.type=async
compression.type=none
serializer.class=kafka.serializer.DefaultEncoder
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer

We would like to aggregate the data in multiple kafka cluster/topics: Can we read from 2 clusters (Cluster1 and Cluster2) and write to both clusters in different topics? Or, do you suggest read from one cluster (cluster1) and write to two clusters (cluster1 and cluster2) in different topics?

Is it possible, if yes, how?

For this scenario what is the best practices/solution?

Issue Analytics

State:
Created 5 years ago
Comments:15 (7 by maintainers)

Top GitHub Comments

1reaction

xhl1988commented, Oct 30, 2018

Sorry for the confusion. Let’s see if this is more clear: one urep cluster can only handle one src and one dst cluster. In your case, since you only have two clusters, you won’t get much benefit from Federated-uReplicator branch: In master branch, you need to set up two urep clusters yourself. In Federated-uReplicator branch, you still to set up two federation cluster (one in each region because you want to run workers in each region), then the federation layer will automatically create one urep clusters in each region for you when you whitelist the topic.

However, in the following case, Federated-uReplicator branch will gain you much benefit: You have cluster1,2 in US east and cluster 3,4 in US west. In master brach, you need to manually create 12 clusters yourself. In Federated-uReplicator branch, you only need to 4 federation cluster (one in each cluster). all those 12 clusters in master branch will be created automatically for you.

1reaction

xhl1988commented, Oct 29, 2018

Federated-uReplicator has a federation layer on topic of master branch. It automatically set up cluster1->cluster2 and cluster2->cluster1 pipelines for you. However, which pipeline can only have one src and one dst cluster. e.g. In Federated-uReplicator branch, you need to set up one manager cluster, one controller cluster, one worker cluster, but cluster1->cluster2 and cluster2->cluster1 replication will be handled by manager. In master branch, you need to set up two controller cluster, two worker cluster, one handles cluster1->cluster2 and the other handles cluster2->cluster1.

Top Results From Across the Web

Is it better to split Kafka clusters? | Red Hat Developer

A mind map for Apache Kafka cluster segregation strategies shows the concerns that can drive a multiple-cluster setup.

Is it possible to for multiple kafka connect cluster to read from ...

I'm using SpoolDirCsvSourceConnector to load CSV data into one Kafka topic. My CSV input file is around 3- ...

Managing Topics across Multiple Kafka Clusters | 6.3.x

You can distribute messages across multiple clusters. It can be handy to have a copy of one or more topics from other Kafka...

Managing Multi-Cluster Kafka Connect and KSQL with Control ...

The Connect clusters are run by separate teams: one ingesting data from a source, the other taking transformed data and streaming it to...

Kafka Clusters Architecture 101: A Comprehensive Guide

Kafka with more than one broker is called Kafka Cluster. It can be expanded and used without downtime. Apache Kafka Clusters are used...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Can you read from multiple sources kafka clusters and write to multiple kafka clusters?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

ERROR Error registering metrics! (com.uber.stream.kafka.mirrormaker.controller.validation.ValidationManager:266) java.lang.NullPointerException

Preserving Partition in the Destination Cluster