question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmarking methodology used is not quite correct

See original GitHub issue

Hi @1ytic ,

Thank you for this work on optimizing warp rnn-t operation which is becoming increasingly useful for many speech recognition acoustic models. We have studied your implementation and have the following observations:

  1. When we run the benchmark script in your repo, the run time we got is as below, where new refers to this new repo, and the baseline is what we have now in RNN-T reference model. We used B=32, T=U=200, V= 29, which is a typical case in our dataset. From the output of the benchmark script, it does appear that the new loss function runs faster:

new: 1.76 ms baseline: 6.10 ms

  1. However, in the benchmark script that the author provided, the run time was measured as:
t = timer()
costs = loss(xs, ys, xn, yn)
elapsed_time += timer() - t

This way of measuring has a problem that CPU could run ahead of GPU and stop the timer even before the kernel is completed. After adding synchronization as below,

torch.cuda.synchronize() # sync before start the timer
t = timer()
costs = loss(xs, ys, xn, yn)
torch.cuda.synchronize() # sync before stop the timer
elapsed_time += timer() - t

the run time we get is:

new: 15.82 ms baseline: 6.12 ms

  1. This is similar to the run time we got from GPU profiler nsys:

new: 14.38 ms baseline: 4.75 ms

In summary - It does not look like that the alternative loss function is running faster than what we have. The claimed speedup in the repo is likely caused by a flawed benchmark methodology.

Can you share your thought process on this?

Thanks, Ashish

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dophistcommented, May 7, 2021

It was tested on 2080ti

0reactions
1yticcommented, Aug 23, 2021

I added one more explanation 2944982 of performance issue with NVIDIA Profiler.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding the Purpose and Use of Benchmarking
Benchmarking is a process for obtaining a measure – a benchmark. Simply stated, benchmarks are the “what,” and benchmarking is the “how.” But...
Read more >
Benchmarking: A Method for Continuous Quality Improvement ...
Benchmarking a systematic approach to identifying the benchmark, comparing yourself to the benchmark and identifying practices that enable you to become the new ......
Read more >
8 Steps of the Benchmarking Process | Lucidchart Blog
In business, benchmarking is a process used to measure the quality and performance of your company's products, services, and processes.
Read more >
Why Benchmarking Efforts Fail - Quality America Inc.
Teams do not understand their work completely: If the benchmarking team did not map, flowchart, or document its work process, and if it...
Read more >
Benchmarking - an overview | ScienceDirect Topics
Benchmarking is a widely used concept, but one that is often misinterpreted and insufficiently used. Although the simple definition of benchmarking is to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found