Latents / seeds are a mess. Make it easier to replicate a generated image using a seed.
See original GitHub issueProblem:
We often generate images with a batch_size >1.
However, images in the batch (after the first image) by default have an seed that is unknown to the user, so all but the first image in a batch can’t be directly replicated.
To get around this, the docs suggest that we manually feed in latents.
What’s a latent?? Like most devs, I’m arriving here with zero domain expertise.
But whatever, I figured it out (a latent seems to be an image of white noise, generated from a seed, which the diffuser looks at to begin dreaming up its image), and I did as I was told.
I decided it would make sense for the seeds in a batch to be sequential, so, for any given batch of images, if you specify that txt2img(prompt="astronaut riding horse", myManualSeed= 42069, batch_size=6)
, the second image in the batch can be replicated with the seed yourManualSeed + 1
, and so on:
def getSequentialLatents(settings:DreamSettings,pipe=txt2imgPipe):
theDevice="cuda"
generator = torch.Generator(device=theDevice)
batchWidth = settings.batchWidth
width = settings.width
height = settings.height
latents = None
thisSeed=settings.seed
for _ in range(batchWidth):
generator = generator.manual_seed(thisSeed)
newLatent = torch.randn(
(1, pipe.unet.in_channels, height // 8, width // 8),
generator = generator,
device = theDevice
)
latents = newLatent if latents is None else torch.cat((latents, newLatent))
thisSeed += 1
return latents
This was a pain to merely know the seeds that are present in a batch! This is a basic need for generating and refining images, and as such I believe this should be under the hood.
Furthermore, this hacky solution doesn’t work for img2img, as “latents” can’t be specified!
img2imgPipe(latents = sequentialLatents)
---------------------------------------------------------------------------
TypeError: __call__() got an unexpected keyword argument 'latents'
So, there is currently no easy way to know the seeds that make up your batch in img2img. If you want to perform more inference steps specifically on the second image in an img2img batch, you’re out of luck.
Proposed solution:
Make manual_seed() by default create sequential seeds for a batch, as I have sketched out above. Make manual_seed() universally do this, for txt2img, img2img, and inpainting.
Then, if you specify a seed for a batch, you will know that the second image of a batch will be (seed+1), and so on. Simple and easy.
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:20 (15 by maintainers)
Top GitHub Comments
https://github.com/huggingface/diffusers/pull/1718 should solve this. Also adding a nice doc page for it.
My favorite idea for this so far is to use a coordinate-based noise system.
A
torch.Generator
is a one-dimensional function, and it has internal state that advances its position every time it’s called.A coordinate-based function would look more like this:
for use like
to say “give me the three-dimensional box of noise(seed=42) that starts at (0, 0, 0) and is 4 layers deep, height tall and width wide.”
That’s an example I developed for three dimensions. A slightly different use case but it runs in to the same problems we’ve been discussing: if you change
width
with a one-dimensional noise generator, then everything gets all out of place, even if you just wanted to make things 12% wider. Or shift them to the left a bit. etc.Adding another few dimensions — instead of (channel, height, width), using (step, batch_index, channel, height, width) — would enable us to do things like
Being explicit about the dimensionality and shape of the noise makes it a lot easier to reproduce later.
The major caveats being:
torch.Generator
.