Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmarking methodology used is not quite correct

See original GitHub issue

Hi @1ytic ,

Thank you for this work on optimizing warp rnn-t operation which is becoming increasingly useful for many speech recognition acoustic models. We have studied your implementation and have the following observations:

When we run the benchmark script in your repo, the run time we got is as below, where new refers to this new repo, and the baseline is what we have now in RNN-T reference model. We used B=32, T=U=200, V= 29, which is a typical case in our dataset. From the output of the benchmark script, it does appear that the new loss function runs faster:

new: 1.76 ms baseline: 6.10 ms

However, in the benchmark script that the author provided, the run time was measured as:

t = timer()
costs = loss(xs, ys, xn, yn)
elapsed_time += timer() - t

This way of measuring has a problem that CPU could run ahead of GPU and stop the timer even before the kernel is completed. After adding synchronization as below,

torch.cuda.synchronize() # sync before start the timer
t = timer()
costs = loss(xs, ys, xn, yn)
torch.cuda.synchronize() # sync before stop the timer
elapsed_time += timer() - t

the run time we get is:

new: 15.82 ms baseline: 6.12 ms

This is similar to the run time we got from GPU profiler nsys:

new: 14.38 ms baseline: 4.75 ms

In summary - It does not look like that the alternative loss function is running faster than what we have. The claimed speedup in the repo is likely caused by a flawed benchmark methodology.

Can you share your thought process on this?

Thanks, Ashish

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

dophistcommented, May 7, 2021

It was tested on 2080ti

0reactions

1yticcommented, Aug 23, 2021

I added one more explanation 2944982 of performance issue with NVIDIA Profiler.

Top Results From Across the Web

Understanding the Purpose and Use of Benchmarking

Benchmarking is a process for obtaining a measure – a benchmark. Simply stated, benchmarks are the “what,” and benchmarking is the “how.” But...

Benchmarking: A Method for Continuous Quality Improvement ...

Benchmarking a systematic approach to identifying the benchmark, comparing yourself to the benchmark and identifying practices that enable you to become the new ......

8 Steps of the Benchmarking Process | Lucidchart Blog

In business, benchmarking is a process used to measure the quality and performance of your company's products, services, and processes.

Why Benchmarking Efforts Fail - Quality America Inc.

Teams do not understand their work completely: If the benchmarking team did not map, flowchart, or document its work process, and if it...

Benchmarking - an overview | ScienceDirect Topics

Benchmarking is a widely used concept, but one that is often misinterpreted and insufficiently used. Although the simple definition of benchmarking is to...