Cannot reproduce training performance
See original GitHub issueHi Gyeongsik,
I am working on reproducing the numbers reported in the paper. Train dataset: H36M, MuCo, COCO Test dataset: 3DPW
I am using pytorch 1.8, python 3.8, cuda10
I did two runs. Here is the performance of snapshot12.pth on 3DPW dataset (last checkpoint of lixel stage)
- Train Batch Size per GPU = 16, Number of GPUs = 4 (this is the default config)
MPJPE from lixel mesh: 96.23 mm
PA MPJPE from lixel mesh: 60.68 mm
- Train Batch Size per GPU = 24, Number of GPUs = 8 (bigger batch config)
MPJPE from lixel mesh: 96.37 mm
PA MPJPE from lixel mesh: 61.51 mm
I also trained the bigger batch config (run2) for the param stage. Here is the performance snapshot17.pth and snapshot15.pth (the best checkpoint) on 3DPW dataset.
snapshot17.pth, param stage
MPJPE from lixel mesh: 95.85 mm
PA MPJPE from lixel mesh: 61.21 mm
MPJPE from param mesh: 98.11 mm
PA MPJPE from param mesh: 61.64 mm
snapshot15.pth, param stage
MPJPE from lixel mesh: 95.65 mm
PA MPJPE from lixel mesh: 60.97 mm
MPJPE from param mesh: 97.22 mm
PA MPJPE from param mesh: 60.82 mm
I am still waiting on the param stage of the default config, will edit this then. But the reported MPJPE for lixel is 93.2 and it looks unlikely that I will converge there. Any suggestions? Should I train longer?
Thank you would greatly appreciate your help.
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (7 by maintainers)
Top Results From Across the Web
What do you do when you cannot reproduce experimental ...
I have run some experiments using open-sourced repos from the authors as my benchmarks. However, for some benchmarks I cannot reproduce ...
Read more >4 Challenges of Reproducibility in the Machine Learning ...
An ML model only reproduces exact same result if the same data is used to train it. However, training data can not be...
Read more >Issues - GitHub
I've benchmarked the code on DGX1 and could not reproduce the issue on our side. The command posted gives approx 1250img/s. I used...
Read more >python - Not able to reproduce results with Tensorflow even ...
I can see this using model.get_weights() after creating the model (this is the case even when I restart the notebook and re-run the...
Read more >How to Reproduce a Non-Reproducible Defect and Make ...
Speaking technically, if you can't reproduce a bug, you can never fix it. The following are some of the factors that determine if...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sorry I changed common/base.py Now it gonna work
I am working on reproducing the result fo 3DPW. Train dataset: H36M, COCO Test dataset: 3DPW lr_dec_epoch = [10,12] end_epoch = 13 lr = 1e-4
The performance is as follow and cannot reach the performance in the paper: MPJPE from lixel mesh:99.05 mm PA MPJPE from lixel mesh: 62.68 mm
I wonder the training settings are all the same even if I use more data such as MuCo? Or I should use different training setting?