Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timer inaccurate due to asynchronous CUDA

See original GitHub issue

Update:

remove separation- log train_iter and load_data together - removes issue

The reported “Data Load” and “Train Iter” times can not be trusted because CUDA is not synchronized. Turns out the dataloader is pretty fast.

Unsure what to do with this info. Either we remove this timing breakdown or sprinkle in some synchronizations.

For correct timing the following synchronized calls are necessary. Unfortunately add this does come at a cost of performance.

  torch.cuda.synchronize()
  with TimeWriter(writer, EventName.ITER_LOAD_TIME, step=step):
      ray_indices, batch = next(iter_dataloader_train)
      torch.cuda.synchronize()
  torch.cuda.synchronize()
  with TimeWriter(writer, EventName.ITER_TRAIN_TIME, step=step) as t:
      loss_dict = self.train_iteration(ray_indices, batch, step)
      torch.cuda.synchronize()

Correct Timings:

Without Sync:

Issue Analytics

State:
Created a year ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

liruilong940607commented, Jul 22, 2022

If there are some modules in the data loading pipeline that needs to live in the pytorch computation graph, for example if the Camera is a torch.nn.Module that needs to receive gradients and update itself, then it makes sense to keep the cuda operations in the data loading pipline. But in that case I would rather push the camera into the “network” part instead of the “dataloader” part. In any case I would prefer to keep the dataloader part in cpu because it is multi-thread-able.

1reaction

liruilong940607commented, Jul 22, 2022

It’s kinda weird to me that there are cuda operations in the data loader. I think pytorch is designed in a way that it encourage you to process data using CPU and multi-threads. For example, torch.utils.data. DataLoader does not support a dataset that use cuda operations to load data because it is not multi-thread-able. The benefits for using multi-threads CPU to load data is theoretically you can fully parallel the data loading with the network so you get zero timing for data loading. Using cuda operations in dataloader would always lead to some burden to the pipeline, though in our case it is light enough.