question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timer inaccurate due to asynchronous CUDA

See original GitHub issue

Update:

  • remove separation- log train_iter and load_data together - removes issue

The reported “Data Load” and “Train Iter” times can not be trusted because CUDA is not synchronized. Turns out the dataloader is pretty fast.

Unsure what to do with this info. Either we remove this timing breakdown or sprinkle in some synchronizations.

For correct timing the following synchronized calls are necessary. Unfortunately add this does come at a cost of performance.

  torch.cuda.synchronize()
  with TimeWriter(writer, EventName.ITER_LOAD_TIME, step=step):
      ray_indices, batch = next(iter_dataloader_train)
      torch.cuda.synchronize()
  torch.cuda.synchronize()
  with TimeWriter(writer, EventName.ITER_TRAIN_TIME, step=step) as t:
      loss_dict = self.train_iteration(ray_indices, batch, step)
      torch.cuda.synchronize()

Correct Timings: image

Without Sync: image

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
liruilong940607commented, Jul 22, 2022

If there are some modules in the data loading pipeline that needs to live in the pytorch computation graph, for example if the Camera is a torch.nn.Module that needs to receive gradients and update itself, then it makes sense to keep the cuda operations in the data loading pipline. But in that case I would rather push the camera into the “network” part instead of the “dataloader” part. In any case I would prefer to keep the dataloader part in cpu because it is multi-thread-able.

1reaction
liruilong940607commented, Jul 22, 2022

It’s kinda weird to me that there are cuda operations in the data loader. I think pytorch is designed in a way that it encourage you to process data using CPU and multi-threads. For example, torch.utils.data. DataLoader does not support a dataset that use cuda operations to load data because it is not multi-thread-able. The benefits for using multi-threads CPU to load data is theoretically you can fully parallel the data loading with the network so you get zero timing for data loading. Using cuda operations in dataloader would always lead to some burden to the pipeline, though in our case it is light enough.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inconsistent kernel run times - CUDA
Kernels calls are asynchronous and return control to the host immediately. If you stop the timer after a kernel call without performing an ......
Read more >
CUDA Streams: Best Practices and Common Pitfalls
Only a single context can be active on a device at a time. ... All CUDA calls are either synchronous or asynchronous w.r.t...
Read more >
Asynchronous CUDA transfer calls not behaving ...
I'm using sys/time.h to profile (code omited for clarity). I find that the cublasSetVectorAsync call dominates the time as though it were behaving...
Read more >
Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time ...
Pitfall 2. Documented sources of implicit synchronization may not occur. 1. A page-locked host memory allocation. 2. A device memory allocation.
Read more >
Asynchronous Version - an overview | ScienceDirect Topics
One problem with host-device synchronization points such as those produced by the function cudaDeviceSynchronize() and the environment variable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found