Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Expected size discrepancy

See original GitHub issue

Hello,

Just sumbled upon what looks like a bug with Rally! The tool looks great and I would really like to use, so hopefully someone can help me understand what’s going on here. Apologies if this is not bug! I looked in the repo and on the dicuss but could not find anything related. The fact it happens everywhere, with different tracks, and that the downloaded size (see below) matches the listed size on the bucket makes me things it might be a bug.

Thanks for your help!

Rally version (get with esrally --version): esrally 2.1.0

Invoked command: esrally race --track=so --target-hosts=<node1_ip>:9200,<node2_ip>:9200,<node3_ip>:9200 --pipeline=benchmark-only (replaced node ips by placeholders)

Configuration file (located in ~/.rally/rally.ini)):

[meta]
config.version = 17

[system]
env.name = local

[node]
root.dir = /home/<username>/.rally/benchmarks
src.root.dir = /home/<username>/.rally/benchmarks/src

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch

[benchmarks]
local.dataset.cache = /home/<username>/.rally/benchmarks/data

[reporting]
datastore.type = in-memory
datastore.host =
datastore.port =
datastore.secure = False
datastore.user =
datastore.password =


[tracks]
default.url = https://github.com/elastic/rally-tracks

[teams]
default.url = https://github.com/elastic/rally-teams

[defaults]
preserve_benchmark_candidate = false

[distributions]
release.cache = true

JVM version: My understand is that this is not required since I’m running --pipeline=benchmark-only

OS version: Ubuntu 20.04.2 LTS

Description of the problem including expected versus actual behavior:

All tracks data downloads, whatever track I choose, always filed because of a size discrepancy. Example with track so:

[INFO] Downloading track data (8.9 GB total size)                                 [100.0%]
[ERROR] Cannot race. Error in track preparator
	Download of [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2] is corrupt. Downloaded [9600716233] bytes but [9599137228] bytes are expected. Please retry.

I’ve tried different tracks, different servers, different networks, I retried a few times, nothing changed. What makes me thinks this might be a bug is that at http://benchmarks.elasticsearch.org.s3.amazonaws.com/, the listed file size for corpora/so/posts.json.bz2 is indeed 9600716233. So it looks like the file is not corrupted but that esrally expects it to have another size for some reason?

Steps to reproduce:

Get a node with ES on it (it might bug with ESRally integrated provisionning, but I can’t install Java to test on these VMs)
Run esrally race --track=so --target-hosts=<node1_ip>:9200 --pipeline=benchmark-only
Observe how the download seemingly fails

Provide logs (if relevant):

2021-04-14 14:37:05,860 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Preparing track [so]
2021-04-14 14:37:05,862 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Resolved data root directory for document corpus [so] in track [so] to [['/home/<username>/.rally/benchmarks/data/so']].
2021-04-14 14:37:05,863 ActorAddr-(T|:41297)/PID:9089 esrally.track.loader INFO Downloading data from [http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/so/posts.json.bz2] (9154 MB) to [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2].
2021-04-14 14:41:51,166 ActorAddr-(T|:41297)/PID:9089 esrally.actor ERROR Track preparator has detected a benchmark failure. Notifying master...
Traceback (most recent call last):

  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 415, in prepare_track
    tp.on_prepare_track(t, data_root_dir)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 88, in on_prepare_track
    if not t.on_prepare_track(track, data_root_dir):

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 437, in on_prepare_track
    prep.prepare_document_set(document_set, data_root[0])

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 585, in prepare_document_set
    self.downloader.download(document_set.base_url, target_path, expected_size)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/track/loader.py", line 500, in download
    net.download(data_url, target_path, size_in_bytes, progress_indicator=progress)

  File "/home/<username>/.local/lib/python3.8/site-packages/esrally/utils/net.py", line 223, in download
    raise exceptions.DataError("Download of [%s] is corrupt. Downloaded [%d] bytes but [%d] bytes are expected. Please retry." %

esrally.exceptions.DataError: Download of [/home/<username>/.rally/benchmarks/data/so/posts.json.bz2] is corrupt. Downloaded [9600716233] bytes but [9599137228] bytes are expected. Please retry.

2021-04-14 14:41:51,169 ActorAddr-(T|:42963)/PID:9088 esrally.actor ERROR Main driver received a fatal exception from a load generator. Shutting down.
2021-04-14 14:41:51,169 ActorAddr-(T|:42963)/PID:9088 esrally.metrics INFO Closing metrics store.
2021-04-14 14:41:51,170 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO Received a benchmark failure from [ActorAddr-(T|:42963)] and will forward it now.
2021-04-14 14:41:51,172 -not-actor-/PID:9052 esrally.racecontrol ERROR A benchmark failure has occurred
2021-04-14 14:41:51,172 -not-actor-/PID:9052 esrally.racecontrol INFO Telling benchmark actor to exit.
2021-04-14 14:41:51,173 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ActorExitRequest] (ignoring).
2021-04-14 14:41:51,174 ActorAddr-(T|:42963)/PID:9088 esrally.actor INFO Main driver received ActorExitRequest and will terminate all load generators.
2021-04-14 14:41:51,174 ActorAddr-(T|:35223)/PID:9087 esrally.actor INFO MechanicActor#receiveMessage unrecognized(msg = [<class 'thespian.actors.ActorExitRequest'>] sender = [ActorAddr-(T|:44985)])
2021-04-14 14:41:51,175 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:35223)] (ignoring).
2021-04-14 14:41:51,176 ActorAddr-(T|:42963)/PID:9088 esrally.actor INFO A track preparator has exited.
2021-04-14 14:41:51,177 ActorAddr-(T|:44985)/PID:9068 esrally.actor INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:42963)] (ignoring).
2021-04-14 14:41:54,176 -not-actor-/PID:9052 esrally.rally INFO Attempting to shutdown internal actor system.
2021-04-14 14:41:54,178 -not-actor-/PID:9067 root INFO ActorSystem Logging Shutdown
2021-04-14 14:41:54,199 -not-actor-/PID:9066 root INFO ---- Actor System shutdown

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

dliappiscommented, Apr 15, 2021

Thanks for much for the verification @alsyia . https://github.com/elastic/rally-tracks/pull/167 has been merged, closing.

1reaction

alsyiacommented, Apr 15, 2021

@dliappis Very clear explanation, thanks!

I added hotfix.url = https://github.com/dliappis/rally-tracks in the [tracks] section of rally.ini, cleaned up the existing data, and ran

esrally race --track=so --target-hosts=es-<host_1>:9200,<host_2>:9200,<host_3>:9200 --pipeline=benchmark-only --track-repository=hotfix --track-revision=update-size-in-5

It works! 🎉

Thank you so much for your reactivity 😃 I guess we can close this ticket since I have a workaround and I see you have a PR for the fix, but I leave it to you, maybe you want to keep it open for tracking!

Top Results From Across the Web

Maternal and fetal characteristics affect discrepancies ... - NCBI

n Neg (%) Ref (%) cOR 95% CI aOR 95% CI n Pos (%) Ref (%) <20 18 662 5 3 1.84 1.78–1.90 1.79...

Discrepancy-Based Model Selection Criteria Using Cross ...

The expected discrepancy reflects how well, on average, the fitted approximating model predicts “new” data generated under the true model. A related measure,....

Discrepancy in storage metrics between Amazon S3 and ...

I'm seeing a discrepancy between the "Calculate total size" number in the Amazon Simple Storage Service (Amazon S3) console and Amazon ...

Discrepancy between pregnancy dating methods affects ...

Negative/positive discrepancy was a fetus smaller/larger than expected when dated by US (EDD postponed/changed to an earlier date).

CLINICAL MANAGEMENT OF SIZE/DATES DISCREPANCY

A size/dates discrepancy (S/D) is a clinical finding that reveals a uterus that is either larger or smaller than expected, given a patient's...