Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different evaluation results are obtained on different GPUs

See original GitHub issue

Hi, Thanks for your amazing work!

I recently encountered a problem, I ran the evaluation of the KITTI dataset on different GPUs and they got different results from the run.

The GPU I used are：

Tesla P100-16GB (provided by Colab)
RTX 3090-24GB (desktop)
RTX 3060-8GB (laptop)

The version of dependency packages are:

torch                     1.11.0+cu113             pypi_0    pypi
torchaudio                0.11.0+cu113             pypi_0    pypi
torchvision               0.12.0+cu113             pypi_0    pypi
opencv-python             4.1.2.30                 pypi_0    pypi
python                    3.7.13               h12debd9_0
scipy                     1.4.1                    pypi_0    pypi
scikit-image              0.18.3                   pypi_0    pypi
pykitti                   0.3.1                    pypi_0    pypi
kornia                    0.6.4                    pypi_0    pypi

The final metric results are as follows:

## RTX 3090
'loss': 0.0, 'metrics': [0.15538795390363738, 2.7975851661019338, 5.625108310379362, 0.18630796428342675, 0.8499496491823949, 0.9419053045479213, 0.9715246053337913], 'metrics_correct': [0.1553879539036371, 2.797585166101927, 5.625108310379343, 0.18630796428342658, 0.849949649182393, 0.941905304547917, 0.9715246053337869], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']

## RTX3060
loss': 0.0, 'metrics': [0.15538795390363738, 2.7975851661019338, 5.625108310379362, 0.18630796428342675, 0.8499496491823949, 0.9419053045479213, 0.9715246053337913], 'metrics_correct': [0.1553879539036371, 2.797585166101927, 5.625108310379343, 0.18630796428342658, 0.849949649182393, 0.941905304547917, 0.9715246053337869], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']

## P100 (provided by colab)
'loss': 0.0, 'metrics': [0.050102306426327375, 0.29000417841479736, 2.2656219501116497, 0.08224765892285756, 0.9723125736912782, 0.9907187367026828, 0.9957081838727743], 'metrics_correct': [0.0501023064263273, 0.29000417841479575, 2.2656219501116426, 0.08224765892285742, 0.9723125736912774, 0.9907187367026803, 0.9957081838727698], 'valid_batches': 4317.0, 'loss_loss': 0.0, 'metrics_info': ['abs_rel_sparse_metric', 'sq_rel_sparse_metric', 'rmse_sparse_metric', 'rmse_log_sparse_metric', 'a1_sparse_metric', 'a2_sparse_metric', 'a3_sparse_metric']

Full evaluation logs are attached as bellow: colab_p100_eval_log.txt RTX3090_eval_log.txt RTX3060_eval_log.txt

I found that 3090 and 3060 run with the same result, but it differs greatly from the result of P100, which is very closed to the result provided by the paper.

I did not change any code during the above evaluation process, and they use the same data set.

Have you encountered this problem? Could you please give me some advice?

Issue Analytics

State:
Created a year ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

neil0306commented, Jun 10, 2022

This result is what makes it so difficult for me…

Personally, I don’t think this is due to a PyTorch version issue; after all, the P100 also uses PyTorch 1.11.0 （All GPUs were all kept in the same environment during the experiment）and gives a good result.

Also, since the RTX30 series only supports cuda11 and newer, and PyTorch 1.5 requires cuda10, there is no way to use such a low version of PyTorch for the RTX30 series GPUs. It makes me feel helpless…

Anyway, thank you very much for your reply! Please let me know if you can think of other possible reasons. Your help is much appreciated!

Best

0reactions

ruili3commented, Sep 22, 2022

This result is what makes it so difficult for me…

Personally, I don’t think this is due to a PyTorch version issue; after all, the P100 also uses PyTorch 1.11.0 （All GPUs were all kept in the same environment during the experiment）and gives a good result.

Also, since the RTX30 series only supports cuda11 and newer, and PyTorch 1.5 requires cuda10, there is no way to use such a low version of PyTorch for the RTX30 series GPUs. It makes me feel helpless…

Anyway, thank you very much for your reply! Please let me know if you can think of other possible reasons. Your help is much appreciated!

Best

Hi,

I also encountered the same issue. This is due to the Ampere architecture used in NVIDIA 30xx seires. You can correct the results by simply set torch.backends.cuda.matmul.allow_tf32 = False. You can also refer to this for further information. Hope it is helpful to you 😃

Best

Top Results From Across the Web

Different results on multiple evaluating on the GPU and model ...

The results still are different between evaluations, and are massively different from what it reportedly saved. Thanks!

Why Do I Get Different Results Each Time in Machine Learning?

The machine learning models may be different each time they are trained. In turn, the models may make different predictions, and when evaluated, ......

No evaluation results printed during multi-gpu training #1022

While multi-gpu training, I periodically do evaluation using the cfg.TEST.EVAL_PERIOD. However, I don't get any evaluation results, such as mAP ...

How to Test Your Graphics Card - Tom's Hardware

There are three primary types of GPU tests: actual games, 'synthetic' graphics card benchmarks, and compute benchmarks.

Efficient Training on Multiple GPUs - Hugging Face

Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques...