In GPU mode generated image is all black with NaN tensor values (no problems in CPU mode)
See original GitHub issueHello, For both “text2im.ipynb” and “clip_guided.ipynb” I’m seeing that the generated image is all black. This only happens in GPU mode (Nvidia GTX 1660 TI, 6 GB), while in CPU mode the image is generated correctly. I’m on Windows 10 using Python 3.8 and
torch-1.11.0+cu115 pypi_0 pypi torchvision-0.12.0+cu115 pypi_0 pypi
and this environment works fine for all other ML projects I’m running.
In “text2im.ipynb” I saw that tensor values become NaN in the model_fn function, when model() is called:
# Create a classifier-free guidance sampling function
def model_fn(x_t, ts, **kwargs):
half = x_t[: len(x_t) // 2]
combined = th.cat([half, half], dim=0)
#-----
# Values of 'combined' are not NaN
model_out = model(combined, ts, **kwargs)
# Values of 'model_out' are NaN
#-----
eps, rest = model_out[:, :3], model_out[:, 3:]
cond_eps, uncond_eps = th.split(eps, len(eps) // 2, dim=0)
half_eps = uncond_eps + guidance_scale * (cond_eps - uncond_eps)
eps = th.cat([half_eps, half_eps], dim=0)
return th.cat([eps, rest], dim=1)
As I tried to track down the problem a bit further, I found that the values start getting wrong in the forward function of “text2im_model.py”:
specifically at line 133, where module is called:
Here, at iteration # 2 some values become NaN and at iteration # 6 all values become NaN.
Please take a look:
----------------- INSIDE FOR LOOP, iteration #: 1
----------------- INSIDE FOR LOOP, value of 'h' before module call:
tensor([[[[ 0.9609, 0.4629, -0.9834, ..., 1.6162, -0.5767, -0.4253],
[ 0.5947, -0.8301, 1.7686, ..., -2.5215, 0.2920, -0.2183],
[ 1.9561, -0.8403, 0.4053, ..., 0.4990, -2.0176, -0.2935],
...,
[ 1.8125, -0.4285, 0.1121, ..., -1.1416, -2.6562, -1.1348],
[ 0.9204, -0.4434, -0.1824, ..., 0.2864, 1.7188, -0.8999],
[ 1.8369, 0.2583, 0.4895, ..., 1.4004, 1.5371, 2.8203]],
[[ 1.7607, 0.4749, 1.9160, ..., -0.6079, -0.5513, -3.0527],
[ 0.9780, 1.3984, 1.7266, ..., 0.2903, -0.7969, -1.4316],
[-0.5293, -2.6465, -1.6699, ..., -0.2900, -1.6738, 0.6704],
...,
[ 0.0657, -0.7827, 1.1904, ..., -0.3643, 0.7754, -0.8740],
[ 1.0801, -1.1260, -0.1700, ..., 1.4443, -0.3196, -0.1392],
[-1.0645, 1.0898, -0.3838, ..., 0.3491, 0.4077, -1.4492]],
[[ 0.1176, 0.6514, 0.8452, ..., 1.3486, -2.3496, -0.1377],
[-1.6523, -0.1711, -0.1355, ..., 1.2236, 1.0068, 1.9863],
[ 0.7456, 1.1943, 0.1819, ..., -2.1719, 1.7148, 0.0917],
...,
[ 0.4253, -1.0078, 0.7847, ..., 1.1348, 0.8101, 0.7744],
[-1.1299, -0.0173, -0.5522, ..., 0.3960, 1.0762, 0.1404],
[-0.0644, -0.0656, 1.1670, ..., -0.1234, 0.6870, -0.5278]]],
...
device='cuda:0', dtype=torch.float16)
----------------- INSIDE FOR LOOP, module function that will now be called is:
TimestepEmbedSequential(
(0): Conv2d(3, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
----------------- INSIDE FOR LOOP, value of 'h' after module call:
tensor([[[[-0.3325, -0.4204, -1.3887, ..., 0.0850, -0.1570, -0.6255],
[ 0.5010, -0.4548, 0.2632, ..., -1.8027, -0.2144, -1.4512],
[ 0.1343, -1.0498, 0.4097, ..., -0.0427, -2.1836, -0.3203],
...,
[-0.2983, -0.2622, -1.0098, ..., -1.7773, -1.7871, -1.3760],
[ 0.1865, -0.8691, -0.1841, ..., -0.5342, -0.8232, -1.7949],
[ 0.4858, -0.7051, -0.7515, ..., 0.7300, 0.0771, 0.6509]],
[[-0.5107, -0.1924, 0.4790, ..., -1.6797, 1.5586, -1.1074],
[-0.8438, -1.3945, -0.8652, ..., -0.1021, -1.9297, -1.8242],
[-1.6289, 0.6030, -1.5410, ..., 1.0488, -0.4473, 0.7524],
...,
[-2.0586, 0.6978, -1.9316, ..., -1.4785, 1.0742, 0.2190],
[-1.0010, -0.6309, 0.3979, ..., 0.3286, -0.3005, 0.8218],
[-1.4961, -1.0723, -1.5293, ..., 1.8125, -0.7954, -0.2915]],
...
device='cuda:0', dtype=torch.float16)
----------------- INSIDE FOR LOOP, iteration #: 2
----------------- INSIDE FOR LOOP, value of 'h' before module call:
tensor([[[[-0.3325, -0.4204, -1.3887, ..., 0.0850, -0.1570, -0.6255],
[ 0.5010, -0.4548, 0.2632, ..., -1.8027, -0.2144, -1.4512],
[ 0.1343, -1.0498, 0.4097, ..., -0.0427, -2.1836, -0.3203],
...,
[-0.2983, -0.2622, -1.0098, ..., -1.7773, -1.7871, -1.3760],
[ 0.1865, -0.8691, -0.1841, ..., -0.5342, -0.8232, -1.7949],
[ 0.4858, -0.7051, -0.7515, ..., 0.7300, 0.0771, 0.6509]],
[[-0.5107, -0.1924, 0.4790, ..., -1.6797, 1.5586, -1.1074],
[-0.8438, -1.3945, -0.8652, ..., -0.1021, -1.9297, -1.8242],
[-1.6289, 0.6030, -1.5410, ..., 1.0488, -0.4473, 0.7524],
...,
[-2.0586, 0.6978, -1.9316, ..., -1.4785, 1.0742, 0.2190],
[-1.0010, -0.6309, 0.3979, ..., 0.3286, -0.3005, 0.8218],
[-1.4961, -1.0723, -1.5293, ..., 1.8125, -0.7954, -0.2915]],
...
device='cuda:0', dtype=torch.float16)
----------------- INSIDE FOR LOOP, module function that will now be called is:
TimestepEmbedSequential(
(0): ResBlock(
(in_layers): Sequential(
(0): GroupNorm32(32, 192, eps=1e-05, affine=True)
(1): Identity()
(2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(h_upd): Identity()
(x_upd): Identity()
(emb_layers): Sequential(
(0): SiLU()
(1): Linear(in_features=768, out_features=384, bias=True)
)
(out_layers): Sequential(
(0): GroupNorm32(32, 192, eps=1e-05, affine=True)
(1): SiLU()
(2): Dropout(p=0.1, inplace=False)
(3): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(skip_connection): Identity()
)
)
----------------- INSIDE FOR LOOP, value of 'h' after module call:
tensor([[[[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]],
[[-0.6113, -0.2927, 0.3787, ..., -1.7803, 1.4580, -1.2080],
[-0.9443, -1.4951, -0.9658, ..., -0.2024, -2.0293, -1.9248],
[-1.7295, 0.5024, -1.6416, ..., 0.9482, -0.5479, 0.6519],
...,
[-2.1582, 0.5972, -2.0312, ..., -1.5791, 0.9736, 0.1186],
[-1.1016, -0.7314, 0.2976, ..., 0.2283, -0.4009, 0.7212],
[-1.5967, -1.1729, -1.6299, ..., 1.7119, -0.8960, -0.3918]],
...
device='cuda:0', dtype=torch.float16)
As you can see at this point only some values have become NaN. This remain like so until iteration # 6, where, after the module call ALL values become NaN:
----------------- INSIDE FOR LOOP, iteration #: 6
----------------- INSIDE FOR LOOP, value of 'h' before module call:
tensor([[[[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
...,
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan]],
[[-9.6973e-01, 3.6084e-01, -8.0078e-01, ..., -6.1328e-01,
-1.1406e+00, -1.0596e+00],
[-4.0210e-01, -1.0947e+00, -2.0898e-01, ..., -7.3730e-01,
-6.4258e-01, -3.1860e-01],
[-4.3530e-01, -4.1577e-01, -4.6655e-01, ..., 5.1880e-02,
1.5601e-01, -4.0283e-02],
...,
[-5.6934e-01, 2.7954e-01, -1.4346e+00, ..., -4.4751e-01,
-1.3428e-02, -2.9565e-01],
[-5.2148e-01, -6.8652e-01, -8.8770e-01, ..., -2.4341e-01,
-1.3213e+00, 2.9517e-01],
[-1.2842e+00, -6.5234e-01, -1.9214e-01, ..., -1.8779e+00,
-3.9526e-01, -3.7500e-01]],
...
device='cuda:0', dtype=torch.float16)
----------------- INSIDE FOR LOOP, module function that will now be called is:
TimestepEmbedSequential(
(0): ResBlock(
(in_layers): Sequential(
(0): GroupNorm32(32, 192, eps=1e-05, affine=True)
(1): Identity()
(2): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(h_upd): Identity()
(x_upd): Identity()
(emb_layers): Sequential(
(0): SiLU()
(1): Linear(in_features=768, out_features=768, bias=True)
)
(out_layers): Sequential(
(0): GroupNorm32(32, 384, eps=1e-05, affine=True)
(1): SiLU()
(2): Dropout(p=0.1, inplace=False)
(3): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(skip_connection): Conv2d(192, 384, kernel_size=(1, 1), stride=(1, 1))
)
(1): AttentionBlock(
(norm): GroupNorm32(32, 384, eps=1e-05, affine=True)
(qkv): Conv1d(384, 1152, kernel_size=(1,), stride=(1,))
(attention): QKVAttention()
(encoder_kv): Conv1d(512, 768, kernel_size=(1,), stride=(1,))
(proj_out): Conv1d(384, 384, kernel_size=(1,), stride=(1,))
)
)
----------------- INSIDE FOR LOOP, value of 'h' after module call:
tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...
device='cuda:0', dtype=torch.float16)
With my limited knowledge of this field this is all I could find. Please let me know if there some other info I can provide.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8
Top GitHub Comments
Same happens while trying Stable Diffusion with autocast/fp16. The output is black no matter what I do. This is with pytorch 1.12.1 cuda 11.6 cudnn 8.0 on conda
Ok I could finally make it work by installing a version of torch and torchvision coming with
CUDA Toolkit v10.2
.Specifically I downloaded
torch-1.8.0-cp38-cp38-win_amd64.whl
andtorchvision-0.9.0-cp38-cp38-win_amd64.whl
(which include CUDA v10.2 despite not having the a “cu###” suffix) from https://download.pytorch.org/whl/torch_stable.html and then I installed them withpip install filename.whl
.In theory newer version of Torch should work too, provided they come with CUDA 10.2, eg:
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 -f https://download.pytorch.org/whl/torch_stable.html
at least this was a requirement in my case…