Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elasticsearch sink UPSERT performance

See original GitHub issue

Hi guys,

When benchmarking the Elasticsearch sink we’ve seen huge difference in performance between UPSERT vs INERT, a 10x difference. I understand that inherently UPSERT is much slower than regular INSERT but I was surprised to find that it’s a 10x difference. So I’m kind curious if this bottleneck in the connector or in Elasticsearch.

I have 5mil JSON messages in kafka that when UPSERTed should total to 1mil documents (5 messages form one complete elastic document). Doing regular INSERT I was able to average 5k document per second but doing UPSERT I could only get 500 document a second.

Versions: Conflunet Kafka 2.11.0-0.11.0.1 Stream Reactor 0.30 Elasticsearch 5.6.2

Using regular connect-distributed with the schema turned off. My connector configurations:

{
  "name": "elastic-sink-ztest",
  "config": {
    "connector.class": "com.datamountaineer.streamreactor.connect.elastic5.ElasticSinkConnector",
    "tasks.max": "1",
    "topics": "ztest",
    "connect.elastic.kcql": "UPSERT INTO ztest SELECT * from ztest PK id WITHDOCTYPE=event",
    "connect.elastic.cluster.name": "elastic",
    "connect.elastic.url": "10.10.10.1:9300",
    "connect.progress.enabled": true
  }
}

Issue Analytics

State:
Created 6 years ago
Comments:5

Top GitHub Comments

1reaction

a3ammarcommented, Dec 20, 2017

Hi @Antwnis sadly I couldn’t get esrally to do doc_as_upsert, and I don’t have enough time to figure it out or write my own script.

It’s definitely a counting error, 500 documents/second means indexing 2500 messages per second.

Right now my priorities have changed but I’ll be revisiting this in the near future and if I have more interesting finding that might be connector related I’ll reopen this issue or make a new one.

Happy hacking!

0reactions

Antwniscommented, Dec 19, 2017

Hi @a3ammar did you manage to get to the bottom of this to have a better understanding whether this is a bottleneck or it is due to the way the count is done?

Top Results From Across the Web

Elasticsearch sink UPSERT performance · Issue #342 - GitHub

Hi guys, When benchmarking the Elasticsearch sink we've seen huge difference in performance between UPSERT vs INERT, a 10x difference.

Update/Upsert Performance Improvements - Elasticsearch

I'm having data that is very frequently updated, so I use bulk updates (50k documents, ~25MB) to update the data in elasticsearch.

Does Elasticsearch Sink Connector support upsert mode on ...

I'm moving data from Mongodb -> Elasticsearch using kafka connect. At the moment the updated records are inserted ...

Updates, Inserts, Deletes: Comparing Elasticsearch ... - Rockset

We compare and contrast how Elasticsearch and Rockset handle data ingestion, including updates and deletes, as well as provide practical ...

Elasticsearch Service Sink Connector for Confluent Cloud

The Kafka Connect Elasticsearch Service Sink connector for Confluent Cloud moves data from Apache Kafka® to Elasticsearch. The connector supports Avro, ...