Weird behavior of pretrained models with CUDA
See original GitHub issuePython: 3.6.9 CUDA: 11.2 compressai: 1.1.3 torch: 1.8.1
When replicating the RD curves in README for kodak dataset (wget http://r0k.us/graphics/kodak/kodak/kodim{0,1,2}{0,1,2,3,4,5,6,7,8,9}.png
in ./kodak/
), I observed a weird behavior of pretrained models with CUDA.
Looks like working well without CUDA
python -m compressai.utils.eval_model pretrained ./kodak/ -a bmshj2018-hyperprior --metric mse --quality 5
metric = mse
Downloading: "https://compressai.s3.amazonaws.com/models/v1/bmshj2018-hyperprior-5-f8b614e1.pth.tar" to /home/yoshitom/.cache/torch/hub/checkpoints/bmshj2018-hyperprior-5-f8b614e1.pth.tar
100.0%
{
"name": "bmshj2018-hyperprior",
"description": "Inference (ans)",
"results": {
"psnr": [
34.52624269077672
],
"ms-ssim": [
0.9835608204205831
],
"bpp": [
0.6686842176649305
],
"encoding_time": [
0.2404747505982717
],
"decoding_time": [
0.5095066924889883
]
}
}
metric = ms-ssim
python -m compressai.utils.eval_model pretrained ./kodak/ -a bmshj2018-hyperprior --metric ms-ssim --quality 5
Downloading: "https://compressai.s3.amazonaws.com/models/v1/bmshj2018-hyperprior-ms-ssim-5-c34afc8d.pth.tar" to /home/yoshitom/.cache/torch/hub/checkpoints/bmshj2018-hyperprior-ms-ssim-5-c34afc8d.pth.tar
100.0%
{
"name": "bmshj2018-hyperprior",
"description": "Inference (ans)",
"results": {
"psnr": [
28.992422918554542
],
"ms-ssim": [
0.9866020356615385
],
"bpp": [
0.47353786892361116
],
"encoding_time": [
0.24171670277913412
],
"decoding_time": [
0.5283569494883219
]
}
}
PSNR and MS-SSIM are both NaN when using CUDA
python -m compressai.utils.eval_model pretrained ./kodak/ -a bmshj2018-hyperprior --metric mse --quality 5 --cuda
metric = mse
Downloading: "https://compressai.s3.amazonaws.com/models/v1/bmshj2018-hyperprior-5-f8b614e1.pth.tar" to /home/yoshitom/.cache/torch/hub/checkpoints/bmshj2018-hyperprior-5-f8b614e1.pth.tar
100.0%
{
"name": "bmshj2018-hyperprior",
"description": "Inference (ans)",
"results": {
"psnr": [
NaN
],
"ms-ssim": [
NaN
],
"bpp": [
0.6686876085069443
],
"encoding_time": [
0.034142365058263145
],
"decoding_time": [
0.025616129239400227
]
}
}
python -m compressai.utils.eval_model pretrained ./kodak/ -a bmshj2018-hyperprior --metric ms-ssim --quality 5 --cuda
metric = ms-ssim
Downloading: "https://compressai.s3.amazonaws.com/models/v1/bmshj2018-hyperprior-ms-ssim-5-c34afc8d.pth.tar" to /home/yoshitom/.cache/torch/hub/checkpoints/bmshj2018-hyperprior-ms-ssim-5-c34afc8d.pth.tar
100.0%
{
"name": "bmshj2018-hyperprior",
"description": "Inference (ans)",
"results": {
"psnr": [
NaN
],
"ms-ssim": [
NaN
],
"bpp": [
0.47353786892361116
],
"encoding_time": [
0.03800355394681295
],
"decoding_time": [
0.029240707556406658
]
}
}
I didn’t check all the combinations (model, quality, metrics, with/without CUDA), but at least bmshj2018-hyperprior
with quality=8 (besides one with quality=5) also returned NaN when using CUDA (for both mse
and ms-ssim
checkpoints). There may be more checkpoints that face the same issue.
When I checked the output from a model (i.e., out_dec["x_hat"]
), some value in the tensor is NaN when using CUDA and that must have caused this issue.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
thanks for testing!
Hi @jbegaint, I fetched the master branch and tried the above configs for
bmshj2018-hyperprior
with/without CUDA. The results with CUDA look same as those without CUDA. Thank you for the fix!