question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

25% performance regression in merges

See original GitHub issue

Our weekly multi-node benchmarking (working on making this publicly visible) shows a performance regression in simple dataframe merges, which I can pinpoint to #6975. (This was briefly reverted in #6994 and then reintroduced in #7007).

visualization-3

More specifically, #6975 changes the decision making in _select_keys_for_gather:

https://github.com/dask/distributed/blob/2b23840e33078bda3f60e082b2542502595df1dc/distributed/worker_state_machine.py#L1654-L1665

Prior to this change the logic was

https://github.com/dask/distributed/blob/b133009cee88fd48c8a345cffde0a8e9163426a6/distributed/worker_state_machine.py#L1620-L1630

Note the difference in whether we fetch the top priority task. If I remove the part of the decision making logic that looks at self.incoming_transfer_bytes:

 if ( 
     to_gather 
     and total_nbytes + ts.get_nbytes() > bytes_left_to_fetch
 ): 

Then performance goes back to where it was previously.

Not sure the correct way to square this circle. I don’t understand the how the change in _select_keys_for_gather interacts with the intention of the PR to throttle data transfer.

cc @hendrikmakait (as author of #6975)

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
wence-commented, Sep 26, 2022

If you have time to investigate further, would you mind exploring the effect of increasing distributed.worker.memory.transfer on the runtime of the impacted workloads?

Setting export DASK_DISTRIBUTED__WORKER__MEMORY__TRANSFER=1 (which I think is the maximum value) doesn’t improve things. In fact, it appears that setting this value doesn’t really have an effect at all for this workload (I get effectively the same throughput with export DASK_DISTRIBUTED__WORKER__MEMORY__TRANSFER=0.00000001).

Inspecting the values of self.transfer_incoming_bytes_limit, self.transfer_incoming_bytes, and self.transfer_message_target_bytes, it appears that the limit on bytes_left_to_fetch is always coming from self.transfer_message_target_bytes (which is hard-coded at 50MB).

These benchmarks are running on a high-performance network (depending on the worker pairings between 12 and 45 GiB/s uni-directional bandwidth), so the default to limit grabbing multiple “small” messages from a single worker at 50MB total is getting in the way (I can send multiple GiBs of data in less than a second).

I think what is happening is that previously there might have been two messages in flight between any given pair of workers at any one time, whereas now the changed logic means we limit to a single message.

So I think that #6975 fixed the logic in terms of limiting wrt transfer_message_target_bytes, but this turns out to be bad in some settings. One way to fix this is add configuration for transfer_message_target_bytes, I suppose.

0reactions
hendrikmakaitcommented, Sep 28, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance regression in cuDF merge benchmark · Issue #935
Running the cuDF benchmark with RAPIDS 22.06 results in the following: RAPIDS 22.06 cuDF benchmark $ python ...
Read more >
Effects Of Mergers On Corporate Performance: An Empirical ...
The OLS regression results suggest that the merger deals do not have any significant impact on the profitability, liquidity, and leverage position of...
Read more >
25f Linear Regression Between CEO Compensation & Firm ...
Tutorial 25d : CEO-specific compensation data is merged with the firm-specific performance variable ROA, matched number of observations ...
Read more >
465599 - 25% webgl performance regression on daisy_freon ...
The webgl aquarium have been proven and a standard benchmark for gfx/webgl related performance issues. Just to be clear, this is a freon...
Read more >
The Post-Merger Performance of Acquiring Firms - JSTOR
The existing literature on the post-merger performance of acquiring firms is di- vided. We re-examine this issue, using a nearly exhaustive sample of ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found