Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latents / seeds are a mess. Make it easier to replicate a generated image using a seed.

See original GitHub issue

Problem:

We often generate images with a batch_size >1.

However, images in the batch (after the first image) by default have an seed that is unknown to the user, so all but the first image in a batch can’t be directly replicated.

To get around this, the docs suggest that we manually feed in latents.

What’s a latent?? Like most devs, I’m arriving here with zero domain expertise.

But whatever, I figured it out (a latent seems to be an image of white noise, generated from a seed, which the diffuser looks at to begin dreaming up its image), and I did as I was told.

I decided it would make sense for the seeds in a batch to be sequential, so, for any given batch of images, if you specify that txt2img(prompt="astronaut riding horse", myManualSeed= 42069, batch_size=6), the second image in the batch can be replicated with the seed yourManualSeed + 1, and so on:

def getSequentialLatents(settings:DreamSettings,pipe=txt2imgPipe):
  theDevice="cuda"
  generator = torch.Generator(device=theDevice)
  batchWidth = settings.batchWidth
  width = settings.width
  height = settings.height
  latents = None
  thisSeed=settings.seed
  for _ in range(batchWidth):
    generator = generator.manual_seed(thisSeed)
    newLatent = torch.randn(
          (1, pipe.unet.in_channels, height // 8, width // 8),
          generator = generator,
          device = theDevice
      )
    latents = newLatent if latents is None else torch.cat((latents, newLatent))
    thisSeed += 1
  return latents

This was a pain to merely know the seeds that are present in a batch! This is a basic need for generating and refining images, and as such I believe this should be under the hood.

Furthermore, this hacky solution doesn’t work for img2img, as “latents” can’t be specified!

img2imgPipe(latents = sequentialLatents)
---------------------------------------------------------------------------
TypeError: __call__() got an unexpected keyword argument 'latents'

So, there is currently no easy way to know the seeds that make up your batch in img2img. If you want to perform more inference steps specifically on the second image in an img2img batch, you’re out of luck.

Proposed solution:

Make manual_seed() by default create sequential seeds for a batch, as I have sketched out above. Make manual_seed() universally do this, for txt2img, img2img, and inpainting.

Then, if you specify a seed for a batch, you will know that the second image of a batch will be (seed+1), and so on. Simple and easy.

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:20 (15 by maintainers)

Top GitHub Comments

3reactions

patrickvonplatencommented, Dec 15, 2022

https://github.com/huggingface/diffusers/pull/1718 should solve this. Also adding a nice doc page for it.

3reactions

keturncommented, Sep 29, 2022

My favorite idea for this so far is to use a coordinate-based noise system.

A torch.Generator is a one-dimensional function, and it has internal state that advances its position every time it’s called.

A coordinate-based function would look more like this:

def noise(position, shape, seed) -> np.ndarray:

for use like

latents = noise(
    position = (0, 0, 0),
    shape = (4, height, width), 
    seed = 42,
)

to say “give me the three-dimensional box of noise(seed=42) that starts at (0, 0, 0) and is 4 layers deep, height tall and width wide.”

That’s an example I developed for three dimensions. A slightly different use case but it runs in to the same problems we’ve been discussing: if you change width with a one-dimensional noise generator, then everything gets all out of place, even if you just wanted to make things 12% wider. Or shift them to the left a bit. etc.

Adding another few dimensions — instead of (channel, height, width), using (step, batch_index, channel, height, width) — would enable us to do things like

channels = 4
noise(
    position = (  # starting from
        4, # step
        2, # batch entry
        0, 
        0, 
        0,
    ),
    shape = (
        1, # one step's worth
        3, # three consecutive batch items [2, 3, 4]
        channels, 
        width,
        height,
    ), 
    seed=42
)

Being explicit about the dimensionality and shape of the noise makes it a lot easier to reproduce later.

The major caveats being:

This would still require some changes to the way schedulers access noise functions.
While we can make a procedural noise function like this that’s consistent across platforms, the seeds used for this are absolutely not going to be comparable with the seeds used by torch.Generator.