Explanation of the 0.18215 factor in textual_inversion?
See original GitHub issueHi, just a small question about the quoted script above which is bothering me: where does this 0.18215
number come from? What computation is being done? Is it from some paper? I have seen the same factor elsewhere, too, without explanation. Any guidance would be very helpful, thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:7 (1 by maintainers)
Top Results From Across the Web
Explanation of the 0.18215 factor in textual_inversion?
The scale_factor ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this ...
Read more >What does "0.18215" mean in blog Stable Diffusion with ...
In the part Stable Diffusion with Diffusers, there is this line. What is “0.18215” and why i should do this? And the code...
Read more >Teach StableDiffusion new concepts via Textual Inversion
This guide shows you how to fine-tune the StableDiffusion model shipped in KerasCV using the Textual-Inversion algorithm.
Read more >I'm confused about how models works and textual inversion as ...
Could you explain the embedding thing a bit? Does it work with multiple models and does more images add more consistency?
Read more >HuggingFace Diffusers 0.2 : Stable Diffusion (テキスト - PyTorch
Load the tokenizer and text encoder to tokenize and encode the text. ... scale and decode the image latents with vae latents =...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @garrett361 @patil-suraj @CodeExplode
We introduced the scale factor in the latent diffusion paper. The goal was to handle different latent spaces (from different autoencoders, which can be scaled quite differently than images) with similar noise schedules. The
scale_factor
ensures that the initial latent space on which the diffusion model is operating has approximately unit variance. Hope this helps 😃In case this is useful for others, I’ve written some code to replicate the computation of that magic value. It seems to be a reasonable estimation!
Output: