Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training Error

See original GitHub issue

Hi, Nice work!

I could run the inference code successfully, but encountered with errors during training. Issue1:

    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BEVF_FasterRCNN' object has no attribute 'kd'

I found self.kd and self.kd_feat_loss not defined, so I added self.kd=False in the init() function.

            if self.kd:
                losses_pts['kd_feat_loss'] = self.kd_feat_loss

Issue2:

    Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 4; 31.75 GiB total capacity; 17.30 GiB already allocated; 3.86 GiB free; 26.34 GiB reserved in total by PyTorch)

I used 8xV100(32G) and the error above occurred, I tried with adding torch.cuda.empty_cache() to the training script, but still OOM. Then I tried to tune the parameter sample_per_gpu: 4 → 2 for training.

Issue Analytics

State:
Created a year ago
Comments:9

Top GitHub Comments

1reaction

tingtingliangvscommented, Jun 9, 2022

Hi, the batch size should be 2x8 when LiDAR stream is not frozen. I fix the bug and bevf_pp_2x8_1x_nusc.py should be working. This is because I forgot to make changes when I unified the config name and format.

Here are some tricks in BEVFusion tuning:

when the LiDAR stream is frozen during training, the initial learning rate is set to 0.001, and batch size is 4x8.
when the LiDAR stream and camera stream are both training, the initial learning rate is set to 0.0001, and batch size is 2x8.

In my experience, the batch size is not the key for BEVFusion training but the learning rate is.

0reactions

Birdylxcommented, Nov 14, 2022

@Treemann hi, I’m trying reproduce the lidar stream results, I used the command which authors provide, but get lower mAP and NDS, can you reproduce the lidar stream results?