Validation for non-distributed training
See original GitHub issueHi!
I was going through the code and found that the validate
argument in the _non_dist_train
function in mmdet/apis/train.py
is not being used. I guess that needs to be incorporated in the function. Please let me know if that’s the case.
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Train a model — MMSegmentation 0.29.1 documentation
MMSegmentation implements distributed training and non-distributed training, ... By default we evaluate the model on the validation set after some ...
Read more >Seemingly good results with training a CNN but bad when ...
It means that your model is very good on training data but ... Another issue could be that your train / validation split...
Read more >Distributed Input | TensorFlow Core
In a non-distributed training loop, first create a tf.data. ... You should only enable them after you validate that they benefit the performance...
Read more >How to run distributed training using Horovod and MXNet ...
You can convert the non-distributed training script to a Horovod world by ... [1,0]<stderr>:INFO:root:Training finished with Validation ...
Read more >Distributed Training of Knowledge Graph Embedding ...
model, we divided the data into train, test and validation sets. We ... tributed training using Ray against non-distributed training for.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just in case it helps anyone.
In mmdet/core/evaluation/eval_hooks.py, not only did I comment the
dist.barrier()
lines, I also replaced:result = runner.model(return_loss=False, rescale=True, **data_gpu)
with:result = runner.model.module(return_loss=False, rescale=True, **data_gpu)
And then I was able to train with 4 GPUs and validate. Best,
Hey! I was wondering if I could contribute to the code by simply making a few tweaks to make the code work for distributed/nondistributed training with validation. Also, something similar to the
tools/test.py
file. Though I’m not sure whether that will be the best way, please let me know if I should go ahead and submit a pull request. Thanks!