Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Community Pipeline] Modified Cross-Attention - Structured Diffusion Guidance for Compositional T2I synthesis

See original GitHub issue

Intro

Community Pipelines are introduced in diffusers==0.4.0 with the idea of allowing the community to quickly add, integrate, and share their custom pipelines on top of diffusers.

You can find a guide about Community Pipelines here. You can also find all the community examples under examples/community/. If you have questions about the Community Pipelines feature, please head to the parent issue.

Idea: Modified cross-attention mechanism

This pipeline aims to implement this paper to Stable Diffusion, improving interpretability of the prompts. Some results of the paper Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis:

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

4reactions

20RitikSinghcommented, Oct 17, 2022

I want to work on this.

3reactions

keturncommented, Oct 18, 2022

This is a super interesting paper! Note that there is a reference implementation included in the .zip of “supplementary material” on the paper submission. Also note that implementation is not released under the Apache License, so I don’t know if 🤗 can accept a PR that includes it. It might be safer to do a clean-room implementation from only the description in the text of the paper.

[I Am Not A Lawyer and I Am Not Your Lawyer, but the paper’s authors are obligated to remain anonymous until the end of the ICLR 2023 review period, so they might have a hard time speaking up for themselves right now.]

If you do choose to use the code from the supplementary material, you may need to swap some things around to make it better fit diffusers instead of ldm.

Top Results From Across the Web

[2212.05032] Training-Free Structured Diffusion Guidance for ...

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions. To ...

Training-Free Structured Diffusion Guidance for ... - OpenReview

We propose to incorporate language structures with the cross-attention layers based on a recently discovered property of diffusion-based T2I models. Our method ...

(PDF) SceneComposer: Any-Level Semantic Image Synthesis

We propose a new framework for conditional image synthesis from semantic layouts of any precision levels, ranging from pure text to a 2D ......

/weixi-feng/ Training-Free Structured Diffusion Guidance for ...

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Sensors | September-2 2022 - Browse Articles - MDPI

Unlike optical satellites, synthetic aperture radar (SAR) satellites can operate all day and in all weather conditions, so they have a broad range...