Speech to image pipeline, Unexpected output, green image
See original GitHub issueDescribe the bug
Resuting image is greenish
Reproduction
import torch
import matplotlib.pyplot as plt
from datasets import load_dataset
from diffusers import DiffusionPipeline
from transformers import (
WhisperForConditionalGeneration,
WhisperProcessor,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
audio_sample = ds[3]
text = audio_sample["text"].lower()
speech_data = audio_sample["audio"]["array"]
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to(device)
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
diffuser_pipeline = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="speech_to_image_diffusion",
speech_model=model,
speech_processor=processor,
revision="fp16",
torch_dtype=torch.float16,
)
diffuser_pipeline.enable_attention_slicing()
diffuser_pipeline = diffuser_pipeline.to(device)
output = diffuser_pipeline(speech_data)
plt.imsave('aa.png',output.images[0])
the results seems to be misaligned image
Logs
No response
System Info
diffusers
version: 0.6.0- Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.23
- Python version: 3.9.13
- PyTorch version (GPU?): 1.8.1+cu101 (True)
- Huggingface_hub version: 0.10.1
- Transformers version: 4.23.1
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?:No
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Add Image Output Resources #216 - tektoncd/pipeline - GitHub
Create a Pipeline that has 2 Tasks, one that builds an image and another that does something with that image using the digest...
Read more >Fight against palm snares unexpected users - Reuters
The South Pacific island nation, which prides itself on its green image, has become a top buyer of palm kernel expeller or PKE,...
Read more >Vox - Understand the News
Vox is a general interest news site for the 21st century. Its mission is simple: Explain the news. Politics, public policy, world affairs,...
Read more >POLITICO Playbook
Alexey Furman/Getty Images ... HOLIDAY SURPRISE — Ukrainian President VOLODYMYR ZELENSKYY is set to visit Washington today ... Karen Berg, in a statement....
Read more >FOX 32 Chicago
Chicago news, weather, traffic, and sports from FOX 32, serving the Chicago area and Northwest Indiana. Watch breaking news live or see the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ll have a look at it, thanks for tagging me
Sorry, I found it I wasn’t supposed to
plt.imsave('aa.png',output.images[0])
butoutput.images[0].save('bb.png')