Training Error
See original GitHub issueHi, Nice work!
I could run the inference code successfully, but encountered with errors during training. Issue1:
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BEVF_FasterRCNN' object has no attribute 'kd'
I found self.kd
and self.kd_feat_loss
not defined, so I added self.kd=False
in the init() function.
if self.kd:
losses_pts['kd_feat_loss'] = self.kd_feat_loss
Issue2:
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 3.94 GiB (GPU 4; 31.75 GiB total capacity; 17.30 GiB already allocated; 3.86 GiB free; 26.34 GiB reserved in total by PyTorch)
I used 8xV100(32G) and the error above occurred, I tried with adding torch.cuda.empty_cache()
to the training script, but still OOM. Then I tried to tune the parameter sample_per_gpu: 4 → 2
for training.
Issue Analytics
- State:
- Created a year ago
- Comments:9
Top Results From Across the Web
Training and Testing Errors - CMU Statistics
Estimating test error. Often, we want an accurate estimate of the test error of our method (e.g., linear regression). Why? Two main purposes:....
Read more >What are the “training error” and “test error” used in deep ...
Training error is simply an error that occurs during model training, i.e. dataset inappropriately handle during preprocessing or in feature ...
Read more >What is a training and test error? - Quora
Training error is the error that you get when you run the trained model back on the training data. Remember that this data...
Read more >Training Error | Data Mining - Datacadamia
Training error is the prediction error we get applying the model to the same data from which we trained. Training error is much...
Read more >Training Error - an overview | ScienceDirect Topics
During the learning process, an ML algorithm has access to some training data, on which it must attempt to reduce some error measure,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, the batch size should be 2x8 when LiDAR stream is not frozen. I fix the bug and bevf_pp_2x8_1x_nusc.py should be working. This is because I forgot to make changes when I unified the config name and format.
Here are some tricks in BEVFusion tuning:
In my experience, the batch size is not the key for BEVFusion training but the learning rate is.
@Treemann hi, I’m trying reproduce the lidar stream results, I used the command which authors provide, but get lower mAP and NDS, can you reproduce the lidar stream results?