question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue Description

Hi, Nice work!

I could run the inference code successfully, but encountered with errors during training. Issue1:

    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BEVF_FasterRCNN' object has no attribute 'kd'

I found self.kd and self.kd_feat_loss not defined, so I added self.kd=False in the init() function.

            if self.kd:
                losses_pts['kd_feat_loss'] = self.kd_feat_loss

Issue2:

    Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 4; 31.75 GiB total capacity; 17.30 GiB already allocated; 3.86 GiB free; 26.34 GiB reserved in total by PyTorch)

I used 8xV100(32G) and the error above occurred, I tried with adding torch.cuda.empty_cache() to the training script, but still OOM. Then I tried to tune the parameter sample_per_gpu: 4 → 2 for training.

Issue Analytics

  • State:closed
  • Created 7 months ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
tingtingliangvscommented, Jun 9, 2022

Hi, the batch size should be 2x8 when LiDAR stream is not frozen. I fix the bug and bevf_pp_2x8_1x_nusc.py should be working. This is because I forgot to make changes when I unified the config name and format.

Here are some tricks in BEVFusion tuning:

  1. when the LiDAR stream is frozen during training, the initial learning rate is set to 0.001, and batch size is 4x8.
  2. when the LiDAR stream and camera stream are both training, the initial learning rate is set to 0.0001, and batch size is 2x8.

In my experience, the batch size is not the key for BEVFusion training but the learning rate is.

0reactions
Birdylxcommented, Nov 14, 2022

@Treemann hi, I’m trying reproduce the lidar stream results, I used the command which authors provide, but get lower mAP and NDS, can you reproduce the lidar stream results?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training and Testing Errors - CMU Statistics
Estimating test error. Often, we want an accurate estimate of the test error of our method (e.g., linear regression). Why? Two main purposes:....
Read more >
What are the “training error” and “test error” used in deep ...
Training error is simply an error that occurs during model training, i.e. dataset inappropriately handle during preprocessing or in feature ...
Read more >
What is a training and test error? - Quora
Training error is the error that you get when you run the trained model back on the training data. Remember that this data...
Read more >
Training Error | Data Mining - Datacadamia
Training error is the prediction error we get applying the model to the same data from which we trained. Training error is much...
Read more >
Training Error - an overview | ScienceDirect Topics
During the learning process, an ML algorithm has access to some training data, on which it must attempt to reduce some error measure,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found