question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can you read from multiple sources kafka clusters and write to multiple kafka clusters?

See original GitHub issue

Hello,

We are doing a POC for uReplicator to be used across multiple regions/sites. Let me share the configs:

------- Controller Configs ------- 
cat start-controller-with-args.sh 
#!/bin/bash

export hoa_srcCluster='ZK-CLUSTER-1:2181/srcCluster'
export asd_desCluster='ZK-CLUSTER-2/destCluster'
export hoa_zk='ZK-CLUSTER-1:2181'

args="-helixClusterName testMirrorMaker"
args="${args} -destKafkaZkPath ${asd_desCluster}"
args="${args} -srcKafkaZkPath ${hoa_srcCluster}"
args="${args} -zookeeper ${hoa_srcCluster}"
args="${args} -port 10000 -mode auto"
args="${args} -enableAutoWhitelist true"
args="${args} -autoRebalanceDelayInSeconds 120 -backUpToGit false"
args="${args} -localBackupFilePath ~/backup_"
echo " /bin/bash ./start-controller.sh startMirrorMakerController ${args}"
/bin/bash ./start-controller.sh startMirrorMakerController ${args}

------- Worker configs ------- 
cat /app/uReplicator/config/consumer.properties  | grep -vE "^#"
zookeeper.connect=ZK-CLUSTER-1:2181/srcCluster
zookeeper.connection.timeout.ms=30000
zookeeper.session.timeout.ms=30000
group.id=kloak-mirrormaker-test
consumer.id=kloakmms01-sjc1
partition.assignment.strategy=roundrobin
socket.receive.buffer.bytes=1048576
fetch.message.max.bytes=8388608
queued.max.message.chunks=5
auto.offset.reset=smallest


cat /app/uReplicator/config/producer.properties  | grep -vE '^#'
bootstrap.servers=KAFKA-CLUSTER-2:9092
client.id=kloak-mirrormaker-test
producer.type=async
compression.type=none
serializer.class=kafka.serializer.DefaultEncoder
key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer

We would like to aggregate the data in multiple kafka cluster/topics: Can we read from 2 clusters (Cluster1 and Cluster2) and write to both clusters in different topics? Or, do you suggest read from one cluster (cluster1) and write to two clusters (cluster1 and cluster2) in different topics?

Is it possible, if yes, how?

For this scenario what is the best practices/solution?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
xhl1988commented, Oct 30, 2018

Sorry for the confusion. Let’s see if this is more clear: one urep cluster can only handle one src and one dst cluster. In your case, since you only have two clusters, you won’t get much benefit from Federated-uReplicator branch: In master branch, you need to set up two urep clusters yourself. In Federated-uReplicator branch, you still to set up two federation cluster (one in each region because you want to run workers in each region), then the federation layer will automatically create one urep clusters in each region for you when you whitelist the topic.

However, in the following case, Federated-uReplicator branch will gain you much benefit: You have cluster1,2 in US east and cluster 3,4 in US west. In master brach, you need to manually create 12 clusters yourself. In Federated-uReplicator branch, you only need to 4 federation cluster (one in each cluster). all those 12 clusters in master branch will be created automatically for you.

1reaction
xhl1988commented, Oct 29, 2018

Federated-uReplicator has a federation layer on topic of master branch. It automatically set up cluster1->cluster2 and cluster2->cluster1 pipelines for you. However, which pipeline can only have one src and one dst cluster. e.g. In Federated-uReplicator branch, you need to set up one manager cluster, one controller cluster, one worker cluster, but cluster1->cluster2 and cluster2->cluster1 replication will be handled by manager. In master branch, you need to set up two controller cluster, two worker cluster, one handles cluster1->cluster2 and the other handles cluster2->cluster1.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is it better to split Kafka clusters? | Red Hat Developer
A mind map for Apache Kafka cluster segregation strategies shows the concerns that can drive a multiple-cluster setup.
Read more >
Is it possible to for multiple kafka connect cluster to read from ...
I'm using SpoolDirCsvSourceConnector to load CSV data into one Kafka topic. My CSV input file is around 3- ...
Read more >
Managing Topics across Multiple Kafka Clusters | 6.3.x
You can distribute messages across multiple clusters. It can be handy to have a copy of one or more topics from other Kafka...
Read more >
Managing Multi-Cluster Kafka Connect and KSQL with Control ...
The Connect clusters are run by separate teams: one ingesting data from a source, the other taking transformed data and streaming it to...
Read more >
Kafka Clusters Architecture 101: A Comprehensive Guide
Kafka with more than one broker is called Kafka Cluster. It can be expanded and used without downtime. Apache Kafka Clusters are used...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found