refiners/tests/e2e/test_diffusion_ref
Cédric Deltheil f4e9707297 sdxl test: refreshed reference image
The former one was generated using SDXL 0.9 vs 1.0. The new one has been
generated with diffusers:

    import torch
    from diffusers import StableDiffusionXLPipeline, DDIMScheduler

    noise_scheduler = DDIMScheduler(
        num_train_timesteps=1000,
        beta_start=0.00085,
        beta_end=0.012,
        beta_schedule="scaled_linear",
        clip_sample=False,
        set_alpha_to_one=False,
        steps_offset=1,
    )

    base_model_path = "/path/to/stabilityai/stable-diffusion-xl-base-1.0"

    device = "cuda"
    prompt = "a cute cat, detailed high-quality professional image"
    negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"
    seed = 2

    pipe = StableDiffusionXLPipeline.from_pretrained(base_model_path, scheduler=noise_scheduler, torch_dtype=torch.float16, add_watermarker=False)
    pipe = pipe.to(device)
    generator = torch.Generator(device).manual_seed(seed)
    images = pipe(prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=30, generator=generator).images
2023-09-12 10:59:26 +02:00
..
cutecat_guide_canny.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_depth.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_lineart.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_normals.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_sam.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_init.png initial commit 2023-08-04 15:28:41 +02:00
cyberpunk_guide.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_canny.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_depth.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_lineart.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_normals.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_sam.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_stack.png make high-level adapters Adapters 2023-08-31 10:57:18 +02:00
expected_cutecat_sdxl_ddim_random_init.png sdxl test: refreshed reference image 2023-09-12 10:59:26 +02:00
expected_image_ip_adapter_woman.png add IP-Adapter support for SD 1.5 2023-09-06 15:12:48 +02:00
expected_inpainting_refonly.png initial commit 2023-08-04 15:28:41 +02:00
expected_lora_pokemon.png initial commit 2023-08-04 15:28:41 +02:00
expected_refonly.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_init_image.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_inpainting.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_random_init.png initial commit 2023-08-04 15:28:41 +02:00
expected_textual_inversion_random_init.png Add support for learned concepts e.g. via textual inversion 2023-08-28 10:37:39 +02:00
inpainting-mask.png initial commit 2023-08-04 15:28:41 +02:00
inpainting-scene.png initial commit 2023-08-04 15:28:41 +02:00
inpainting-target.png initial commit 2023-08-04 15:28:41 +02:00
kitchen_dog.png initial commit 2023-08-04 15:28:41 +02:00
kitchen_dog_mask.png initial commit 2023-08-04 15:28:41 +02:00
README.md add IP-Adapter support for SD 1.5 2023-09-06 15:12:48 +02:00
woman.png add IP-Adapter support for SD 1.5 2023-09-06 15:12:48 +02:00

Note about this data

Expected outputs

expected_*.png files are the output of the same diffusion run with a different codebase, usually diffusers with the same settings as us (DPMSolverMultistepScheduler, VAE patched to remove randomness, same seed...).

For instance here is how we generate expected_std_random_init.png:

import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "a cute cat, detailed high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("std_random_init_expected.png")

Special cases:

  • expected_refonly.png has been generated with Stable Diffusion web UI.
  • expected_inpainting_refonly.png, expected_image_ip_adapter_woman.png have been generated with refiners itself (and inspected so that it looks reasonable).

Other images

VAE without randomness

--- a/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
+++ b/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
@@ -524,13 +524,8 @@ class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
                 f" size of {batch_size}. Make sure the batch size matches the length of the generators."
             )

-        if isinstance(generator, list):
-            init_latents = [
-                self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
-            ]
-            init_latents = torch.cat(init_latents, dim=0)
-        else:
-            init_latents = self.vae.encode(image).latent_dist.sample(generator)
+        init_latents = [self.vae.encode(image[i : i + 1]).latent_dist.mean for i in range(batch_size)]
+        init_latents = torch.cat(init_latents, dim=0)

         init_latents = self.vae.config.scaling_factor * init_latents

Textual Inversion

  • expected_textual_inversion_random_init.png has been generated with StableDiffusionPipeline, e.g.:
import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.load_textual_inversion("sd-concepts-library/gta5-artwork")

prompt = "a cute cat on a <gta5-artwork>"
negative_prompt = ""

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("expected_textual_inversion_random_init.png")