question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Explanation of the 0.18215 factor in textual_inversion?

See original GitHub issue

https://github.com/huggingface/diffusers/blob/b2b3b1a8ab83b020ecaf32f45de3ef23644331cf/examples/textual_inversion/textual_inversion.py#L501

Hi, just a small question about the quoted script above which is bothering me: where does this 0.18215 number come from? What computation is being done? Is it from some paper? I have seen the same factor elsewhere, too, without explanation. Any guidance would be very helpful, thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

8reactions
rrombcommented, Sep 9, 2022

Hi @garrett361 @patil-suraj @CodeExplode

We introduced the scale factor in the latent diffusion paper. The goal was to handle different latent spaces (from different autoencoders, which can be scaled quite differently than images) with similar noise schedules. The scale_factor ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this helps 😃

3reactions
fepegarcommented, Dec 19, 2022

In case this is useful for others, I’ve written some code to replicate the computation of that magic value. It seems to be a reasonable estimation!

from diffusers import AutoencoderKL
import torch
import torchvision
from torchvision.datasets.utils import download_and_extract_archive
from torchvision import transforms


num_workers = 4
batch_size = 12
# From https://github.com/fastai/imagenette
IMAGENETTE_URL = 'https://s3.amazonaws.com/fast-ai-imageclas/imagenette2.tgz'

torch.manual_seed(0)
torch.set_grad_enabled(False)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

pretrained_model_name_or_path = 'CompVis/stable-diffusion-v1-4'
vae = AutoencoderKL.from_pretrained(
    pretrained_model_name_or_path,
    subfolder='vae',
    revision=None,
)
vae.to(device)

size = 512
image_transform = transforms.Compose([
    transforms.Resize(size),
    transforms.CenterCrop(size),
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5]),
])

root = 'dataset'
download_and_extract_archive(IMAGENETTE_URL, root)

dataset = torchvision.datasets.ImageFolder(root, transform=image_transform)
loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=num_workers,
)

all_latents = []
for image_data, _ in loader:
    image_data = image_data.to(device)
    latents = vae.encode(image_data).latent_dist.sample()
    all_latents.append(latents.cpu())

all_latents_tensor = torch.cat(all_latents)
std = all_latents_tensor.std().item()
normalizer = 1 / std
print(f'{normalizer = }')

Output:

normalizer = 0.19503
Read more comments on GitHub >

github_iconTop Results From Across the Web

Explanation of the 0.18215 factor in textual_inversion?
The scale_factor ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this ...
Read more >
What does "0.18215" mean in blog Stable Diffusion with ...
In the part Stable Diffusion with Diffusers, there is this line. What is “0.18215” and why i should do this? And the code...
Read more >
Teach StableDiffusion new concepts via Textual Inversion
This guide shows you how to fine-tune the StableDiffusion model shipped in KerasCV using the Textual-Inversion algorithm.
Read more >
I'm confused about how models works and textual inversion as ...
Could you explain the embedding thing a bit? Does it work with multiple models and does more images add more consistency?
Read more >
HuggingFace Diffusers 0.2 : Stable Diffusion (テキスト - PyTorch
Load the tokenizer and text encoder to tokenize and encode the text. ... scale and decode the image latents with vae latents =...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found