Performance issues when viewing images by selecting the images tab of run in aim ui
See original GitHub issue🐛 Bug
After saving the image over several steps, if I try to check it in the aim ui, there will be a part that stops or becomes very slow while loading the image.
In the above situation (slow or frozen), all other requests are not received regardless of the number of workers.
To reproduce
After saving the images on the remote server with the code below, the issue occurs when reading the image on the run page.
change [aim ip] and [model-path] in the code below check this repo https://github.com/huggingface/diffusers i tested with stable diffusion 2.0 model
import argparse
import hashlib
import itertools
import math
import os
from pathlib import Path
from typing import Optional
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
from torch.utils.data import Dataset
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel, StableDiffusionImg2ImgPipeline
from diffusers.optimization import get_scheduler
from huggingface_hub import HfFolder, Repository, whoami
from PIL import Image
from torchvision import transforms
from tqdm.auto import tqdm
from transformers import AutoTokenizer, PretrainedConfig
from aim import Run, Image
import aim
logger = get_logger(__name__)
aim_run = Run(repo='[aim ip]', experiment='test', log_system_params=True)
aim_run.name = 'test'
save_path = '[model-path]'
p = StableDiffusionPipeline.from_pretrained(save_path, torch_dtype=torch.float16, revision='fp16').to('cuda')
prompt = [
"test",
"test",
"test",
"test",
"test",
"test",
"test",
"test",
"test",
"test",
"test",
]
num_inference_steps = 25
guidance_scale = 9.0
num_samples = 5
for i in range(len(prompt)):
images = p(
prompt[i],
num_images_per_prompt=num_samples,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
).images
aim_images = []
for j in range(len(images)):
aim_images.append(aim.Image(images[j], caption=f'{prompt[i]} examples'))
aim_run.track(value=aim_images, name='generated images', step=i)
Expected behavior
The image should be loaded and displayed without slowing down or stopping.
Also, even if the image stops while loading, the server should not stop so that other requests can be received.
Environment
- Aim Version 3.15.0
- Python version 3.10
- pip version 22.3.1
- OS (e.g., Linux) amazonlinux2 arm
- arm cpu (graviton3, c7g)
- remote server setting
- test in chrome, safari… Probably not a browser issue.
Additional context
When storing many images (for example, when learning by 30000 steps, 10-20 image results are saved for every 1000 steps) When I move to the image page of the run, the images displayed on the top (ex. step 30000, 29000, 28000 step images are visible on the page at once, up to 27000 and 26000 step images are not visible on the page, but are loaded at once and scroll down It looks good right away.) It loads and looks good, but if I scroll below it, it takes a very long time to load from 25000 and 24000 step images.
In this state, the server does not receive any requests after that, and even if I connect from another browser, I cannot connect. However, it seems to be piled up in the queue, and the work of reading the image is very slow, but when it is finished, requests that have been delayed since then are processed at once.
The following is the log that appears in the situation described above when the server is started in the debug logging state.
Cannot index Run 8c924d585f164800b74169e7. Index is locked. Cannot index Run 8c924d585f164800b74169e7. Index is locked. Cannot index Run 32dff8c6567b4eb782501460. Index is locked. Cannot index Run 32dff8c6567b4eb782501460. Index is locked. Cannot index Run 32dff8c6567b4eb782501460. Index is locked. Cannot index Run 2a38ac7512a54484b028fab0. Index is locked. Cannot index Run 32dff8c6567b4eb782501460. Index is locked. INFO: [IP]:58789 - “POST /runs/images/get-batch HTTP/1.1” 200 OK Cannot index Run 32dff8c6567b4eb782501460. Index is locked. Cannot index Run 32dff8c6567b4eb782501460. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. Cannot index Run efe94f87aa774888b4854a33. Index is locked. Cannot index Run efe94f87aa774888b4854a33. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. INFO: [IP]:58567 - “POST /runs/images/get-batch HTTP/1.1” 200 OK Cannot index Run 5150187ef9534463aeb0b17b. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked. Cannot index Run 8c1e7409ee414dda8f195e1e. Index is locked.
Issue Analytics
- State:
- Created 9 months ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
@Dong-Ki-Lee regarding the warning messages:
May I ask you to upgrade to the latest
3.15.1
version? There was a patch released, fixing these.Hey @Dong-Ki-Lee, thanks a lot for sharing the details, we have received reports like this earlier and are currently exploring possible solutions to fix this. We’ll keep you updated on the progress.