question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Meet error when enabling xformers on windows and doing loss backward

See original GitHub issue

Describe the bug

Get error when I enable xformers of UNet and try to do backward:

Traceback (most recent call last):
  File "f:/diffusers-test/vae_expr.py", line 66, in <module>
    loss.backward()
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\autograd\__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\autograd\function.py", line 253, in apply   
    return user_fn(self, *args)
  File "f:\xformers\xformers\ops\memory_efficient_attention.py", line 414, in backward
    causal=ctx.causal,
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "F:\\xformers\\xformers\\components\\attention\\csrc\\cuda\\mem_eff_attention\\attention_backward_generic.cu":181, please report a bug to PyTorch.

Reproduction

import argparse
import logging
import math
import os
import random
from pathlib import Path
from typing import Iterable, Optional

import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from diffusers import AutoencoderKL, DDPMScheduler, PNDMScheduler, StableDiffusionPipeline, UNet2DConditionModel
from diffusers.optimization import get_scheduler
from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer

pretrained_model_name_or_path = r'F:\diffusers-weight'
# Load models and create wrapper for stable diffusion
tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(pretrained_model_name_or_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(pretrained_model_name_or_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(pretrained_model_name_or_path, subfolder="unet")
noise_scheduler = DDPMScheduler.from_config(pretrained_model_name_or_path, subfolder="scheduler")

# Freeze vae and text_encoder
vae.requires_grad_(False)
text_encoder.requires_grad_(False)


weight_dtype = torch.bfloat16
# Move text_encode and vae to gpu.
# For mixed precision training we cast the text_encoder and vae weights to half-precision
# as these models are only used for inference, keeping weights in full precision is not required.
text_encoder.to('cuda', dtype=weight_dtype)
vae.to('cuda', dtype=weight_dtype)
unet.to('cuda', dtype=weight_dtype)
unet.set_use_memory_efficient_attention_xformers(True)

                # Convert images to latent space
images = torch.randn(1,3,512,512).to('cuda', dtype=weight_dtype)
latents = vae.encode(images).latent_dist.sample()
latents = latents * 0.18215
# Convert images to latent space
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
# Get the text embedding for conditioning
inputs = tokenizer('Terwt dsfs gsdgs sg"', max_length=tokenizer.model_max_length, padding="do_not_pad", truncation=True)
input_ids = [inputs["input_ids"]]
padded_tokens = tokenizer.pad({"input_ids": input_ids}, padding=True, return_tensors="pt")
input_ids = padded_tokens.input_ids.to('cuda', dtype=torch.int)
encoder_hidden_states = text_encoder(input_ids)[0]

# Predict the noise residual and compute loss
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
loss.backward()

Logs

Traceback (most recent call last):
  File "f:/diffusers-test/vae_expr.py", line 66, in <module>
    loss.backward()
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\autograd\__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\autograd\function.py", line 253, in apply   
    return user_fn(self, *args)
  File "f:\xformers\xformers\ops\memory_efficient_attention.py", line 414, in backward
    causal=ctx.causal,
  File "C:\Users\uuu\.virtualenvs\stable-diffusion\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "F:\\xformers\\xformers\\components\\attention\\csrc\\cuda\\mem_eff_attention\\attention_backward_generic.cu":181, please report a bug to PyTorch.

System Info

  • diffusers version: 0.7.2
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.7.7
  • PyTorch version (GPU?): 1.12.0+cu113 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.24.0
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

xformers version: efdca026381a13319be082f079c60275cc871301 https://github.com/facebookresearch/xformers/commit/efdca026381a13319be082f079c60275cc871301

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

0reactions
patrickvonplatencommented, Dec 12, 2022

Also cc @pcuenca, we should probably add those installs to the README of stable diffusion

Read more comments on GitHub >

github_iconTop Results From Across the Web

Meet error when using xformers and doing loss backward #535
Get error when I enable xformers of UNet and try to do backward: Traceback (most recent call last): File "f:/diffusers-test/vae_expr.py", line ...
Read more >
Transformers: War for Cybertron - PCGamingWiki PCGW
Enable in-game Display Settings. Combat chatter not subtitled. Closed captions. Mute on focus lost. Game will automatically pause in Campaign.
Read more >
Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision methods combine the use of different numerical formats in one computational workload. This document describes the ...
Read more >
Installation Guide Energy Meter with Modbus Connection
It calls attention to a procedure that, if not correctly performed or adhered to, could result in injury or loss of life. Do...
Read more >
HPE ProLiant Gen10 Servers - Troubleshooting Hardware ...
The power supply configuration for the server is insufficient to meet the power requirements for the server. Cause. The current power supply configuration...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found