Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Having to run 12x Elastic Rally instances on the `elastic\logs` track to bottleneck the CPU on the hot data tier

See original GitHub issue

While I don’t have anything super useful to add here in terms of replacements, I would just like to throw my anecdotal hat into this ring with respect to the elastic/logs track I was trying to run against our new NVMe backed hot data tier on on-prem hardware within an ECE cluster. The results I was getting scaling from targeting 1 shard to 2 shards and beyond didn’t improve the overall indexing throughput. I specifically increased the corpus size to around 60 days of data to ensure I had plenty of events to index. My goal was to understand the behaviour the new cluster with respect to hot spotting, shard and replica counts. Unfortunately, Elastic Rally initially gave me the wrong idea.

It wasn’t until I ran multiple copies of Elastic Rally with identical settings concurrently from the same host was I able to actually start approach any of the hardware limits in the cluster. In the end, I had to run 12x Elastic Rally instances on the elastic\logs track to bottleneck the CPU on the hot data tier. I executed all 12 instances from a single server (backed by NVMe, 128 GB of RAM, 32c/64t, 10 Gb network). This resulted in the actual indexing rate rising from 60-70,000 doc/s to 550-600,000 docs/s. The reality was that the server sending the logs weren’t a limiting factor, nor were the hot data tier nodes, but Elastic Rally in quickly providing the documents fast enough to index.

My suspicion was that, similar to the Golang stdlb for encoding/json, that the performance is not super optimised in Python. This issue seems to validate that theory, I just wanted to provide a real world example of where Elastic Rally performance is producing results that could be easily misconstrued by naive users such as myself.

_Originally posted by @berglh in https://github.com/elastic/rally/issues/1046#issuecomment-1225252763_

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

pquentincommented, Aug 29, 2022

I think the main thing that made me not look for documentation was the message on the repository main page. It gave me the impression I shouldn’t be there unless I want to create my own tracks.

I opened https://github.com/elastic/rally-tracks/pull/309, please tell me what you think!

Perhaps linking to each track/subtrack README from the primary repository README would be useful for clarity at a minimum to show that there is good documentation there.

Not sure listing would help as much, I think the directory layout makes it clear that all subdirectories are tracks. And the list would quickly get stale.

Optionally, if there was some clear information on configuring the tracks on the main website that mentioned for each track in the rally-tracks repository, you can find the all the parameters explained in exquisite detail in the track subdirectory README.

I opened https://github.com/elastic/rally/pull/1568, please tell me what you think!

This would indicate to me that it’s just a matter of finding the sweet spot. It still got me wondering though, where is the latency on each individual client if having nproc * 2 of the shipping host indexing clients is unable to saturate the CPU of the host system (assuming storage/RAM/network isn’t the bottleneck - which in my case we know it isn’t). There still seems to be inefficiency we’re fighting with the log shipper - it’s possible it’s well known for you already; it was simply a curiosity to me.

By default, indexing goes “as fast as possible”, but it still waits for each request to be completed before sending another one. This is why you need more indexing clients to saturate the CPU, the work the load driver does is I/O bound. Does that make sense? I’m not sure I’ve understood your point.

Now, If you want to know the latency for each client, you can configure a metrics store and look at the metrics.

My current hypothesis is that the primary shards may have been scheduled on the same physical host, on two different data instances and not distributed across the physical hosts optimally.

Makes sense!

Anyway, I’m going to close this issue now as there’s nothing actionable for Rally left here. Thanks!

0reactions

berglhcommented, Aug 30, 2022

Thanks for your help @pquentin

Top Results From Across the Web

Evaluate whether pysimdjson could be used in Rally #1046

Having to run 12x Elastic Rally instances on the elastic\logs track to bottleneck the CPU on the hot data tier #1566.

Benchmarking cluster with rally - Elasticsearch - Elastic Discuss

Hello, I have set a development cluster with 3 master Nodes, 2 hot data nodes and 1 warm node, and I wanted to...

Tips and Tricks - Rally 2.7.0 documentation

This section covers various tips and tricks in a recipe-style fashion. Benchmarking an Elastic Cloud cluster#. Note. We assume in this recipe, that...

Benchmarking Elasticsearch with Rally by Daniel Mitterdorfer

Rally is the macrobenchmarking framework for Elasticsearch You want to benchmark Elasticsearch ? Then Rally is for you.

ElasticCC: Creating a custom track in rally to benchmark your ...

With Marco Bertani-ØklandDuring this talk I want to present our experience of building a custom track with Rally for a customer case.