scheduler leaky abstractions in pipelines
See original GitHub issueExcited by the opportunities that Stable Diffusion has made for use with consumer hardware, I’ve started working with the code. I was looking to get an idea of what I’d need to do for something like #277 and studying the StableDiffusionPipeline class.
I’m concerned by the use of isinstance(self.scheduler, LMSDiscreteScheduler)
. If you can’t implement a new Scheduler without meddling with the internal implementation of the Pipeline, that points to some breakdown in the API design. We don’t want growing if isintance
trees that grow with each New Scheduler PR.
It also introduces an ordering dependency between components that weren’t otherwise entangled: https://github.com/huggingface/diffusers/blob/e49dd03d2de347caf5f18d04657500412f57103b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L120-L124
It wasn’t obvious to me at first glance, but scheduler.set_timesteps
modifies scheduler.sigmas
. So now when working with that code we have to remember that we have to do that scheduler initialization before we finish preparing the initial latents, whereas those were independent operations before.
Another worrying comment is this: https://github.com/huggingface/diffusers/blob/e49dd03d2de347caf5f18d04657500412f57103b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L126-L130
If “not all schedulers share the same signature,” it’s much harder to make and use Schedulers.
Suggestions
LMSDiscreteScheduler’s step
function seems to want the step number instead of the time. Options:
- Change the signature of
Scheduler.step
to always include the index as well as the time. (It’s readily available and cheap to have on the stack.) - Since LMS is linear (right there in the name!), could it take the usual
timestep
value and easily convert it back to its index value?
For DDIMScheduler’s extra eta
parameter:
The **kwargs
on StableDiffusionPipeline.__call__
seems to be unused, and eta
seems to be passed straight through to scheduler.step
with no other interaction. We could take eta
out of the formal parameter list and rename kwargs
to extra_step_kwargs
.
I haven’t yet looked at other uses of Scheduler outside of this one pipeline, but those are my initial thoughts.
Issue Analytics
- State:
- Created a year ago
- Reactions:12
- Comments:6 (5 by maintainers)
Top GitHub Comments
When I originally looked at this, I was under the impression that there were some differences between whether sigmas or timesteps were computed up front and where some multiplication happened, but I figured that was solvable with a little algebra and a little refactoring without changing the overall design too much.
But when hlky got me looking at why some samplers weren’t behaving as well as others, I realized there’s a bigger difference between the design of Katherine’s sampler functions and diffusers’ Scheduler.step method: Samplers may use the model more than once per timestep.
The operation is very similar to that of guidance functions: Much like the current StableDiffusionPipeline offers an option to run two sets of data through the UNet to perform classifier-free guidance, the higher order functions featured in Elucidating and DPM-Solver (
sample_heun
andsample_dpm
in k_diffusers) use the model to make multiple predictions and combine them for a better result.I expect we’ll see a plethora of other guidance functions that rely on combining the results of multiple predictions, such as Composable Diffusion.
…
Them’s the findings. My thoughts about what to do about it are all tangled up with this project’s philosophy of Pipelines being single purpose and having some sort of pristine original implementation that shall be preserved.
Reading that again, I see that schedulers (if not pipelines) are intended to be building blocks, and that reassures me that we can come up with something we can build applications on. 😊
Edit: Nevermind, solved the below. First, join the HF Discord channel, then go to the #role-assigment channel, then click on 🎨 for access to diffusion-related channels, then follow the link.
@keturn I’m curious about the link you shared: hlky got me looking at why some samplers weren’t behaving
However, when I try to follow it, I’m met with an error message from Discord:
You find yourself in a strange place. You don't have access to any text channels, or there are none in this server
. Could you give more detailed instructions for how to see that conversation?