Different evaluation results are obtained on different GPUs
See original GitHub issueHi, Thanks for your amazing work!
I recently encountered a problem, I ran the evaluation of the KITTI dataset on different GPUs and they got different results from the run.
The GPU I used are:
- Tesla P100-16GB (provided by Colab)
- RTX 3090-24GB (desktop)
- RTX 3060-8GB (laptop)
The version of dependency packages are:
torch 1.11.0+cu113 pypi_0 pypi
torchaudio 0.11.0+cu113 pypi_0 pypi
torchvision 0.12.0+cu113 pypi_0 pypi
opencv-python 4.1.2.30 pypi_0 pypi
python 3.7.13 h12debd9_0
scipy 1.4.1 pypi_0 pypi
scikit-image 0.18.3 pypi_0 pypi
pykitti 0.3.1 pypi_0 pypi
kornia 0.6.4 pypi_0 pypi
The final metric results are as follows:
## RTX 3090
'loss': 0.0, 'metrics': [0.15538795390363738, 2.7975851661019338, 5.625108310379362, 0.18630796428342675, 0.8499496491823949, 0.9419053045479213, 0.9715246053337913], 'metrics_correct': [0.1553879539036371, 2.797585166101927, 5.625108310379343, 0.18630796428342658, 0.849949649182393, 0.941905304547917, 0.9715246053337869], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']
## RTX3060
loss': 0.0, 'metrics': [0.15538795390363738, 2.7975851661019338, 5.625108310379362, 0.18630796428342675, 0.8499496491823949, 0.9419053045479213, 0.9715246053337913], 'metrics_correct': [0.1553879539036371, 2.797585166101927, 5.625108310379343, 0.18630796428342658, 0.849949649182393, 0.941905304547917, 0.9715246053337869], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']
## P100 (provided by colab)
'loss': 0.0, 'metrics': [0.050102306426327375, 0.29000417841479736, 2.2656219501116497, 0.08224765892285756, 0.9723125736912782, 0.9907187367026828, 0.9957081838727743], 'metrics_correct': [0.0501023064263273, 0.29000417841479575, 2.2656219501116426, 0.08224765892285742, 0.9723125736912774, 0.9907187367026803, 0.9957081838727698], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']
Full evaluation logs are attached as bellow: colab_p100_eval_log.txt RTX3090_eval_log.txt RTX3060_eval_log.txt
I found that 3090 and 3060 run with the same result, but it differs greatly from the result of P100, which is very closed to the result provided by the paper.
I did not change any code during the above evaluation process, and they use the same data set.
Have you encountered this problem? Could you please give me some advice?
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Different results on multiple evaluating on the GPU and model ...
The results still are different between evaluations, and are massively different from what it reportedly saved. Thanks!
Read more >Why Do I Get Different Results Each Time in Machine Learning?
The machine learning models may be different each time they are trained. In turn, the models may make different predictions, and when evaluated, ......
Read more >No evaluation results printed during multi-gpu training #1022
While multi-gpu training, I periodically do evaluation using the cfg.TEST.EVAL_PERIOD. However, I don't get any evaluation results, such as mAP ...
Read more >How to Test Your Graphics Card - Tom's Hardware
There are three primary types of GPU tests: actual games, 'synthetic' graphics card benchmarks, and compute benchmarks.
Read more >Efficient Training on Multiple GPUs - Hugging Face
Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This result is what makes it so difficult for me…
Personally, I don’t think this is due to a PyTorch version issue; after all, the P100 also uses PyTorch 1.11.0 (All GPUs were all kept in the same environment during the experiment)and gives a good result.
Also, since the RTX30 series only supports cuda11 and newer, and PyTorch 1.5 requires cuda10, there is no way to use such a low version of PyTorch for the RTX30 series GPUs. It makes me feel helpless…
Anyway, thank you very much for your reply! Please let me know if you can think of other possible reasons. Your help is much appreciated!
Best
Hi,
I also encountered the same issue. This is due to the Ampere architecture used in NVIDIA 30xx seires. You can correct the results by simply set
torch.backends.cuda.matmul.allow_tf32 = False
. You can also refer to this for further information. Hope it is helpful to you 😃Best