refiners/tests/e2e/test_diffusion_ref
2024-10-03 11:05:09 +02:00
..
clarity_input_example.png improve/add MultiDiffusion and MultiUpscaler e2e tests 2024-07-11 15:23:02 +02:00
cutecat_guide_canny.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_CPDS.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
cutecat_guide_depth.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_lineart.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_normals.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_guide_PyraCanny.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
cutecat_guide_sam.png initial commit 2023-08-04 15:28:41 +02:00
cutecat_init.png initial commit 2023-08-04 15:28:41 +02:00
cyberpunk_guide.png initial commit 2023-08-04 15:28:41 +02:00
expected_controllora_CPDS.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
expected_controllora_disabled.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
expected_controllora_PyraCanny+CPDS.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
expected_controllora_PyraCanny.png write ControlLora e2e tests 2024-02-14 18:20:46 +01:00
expected_controlnet_canny.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_canny_scale_decay.png Add scale_decay parameter for SD1 ControlNet 2024-06-24 13:21:27 +02:00
expected_controlnet_depth.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_lineart.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_normals.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_sam.png initial commit 2023-08-04 15:28:41 +02:00
expected_controlnet_stack.png make high-level adapters Adapters 2023-08-31 10:57:18 +02:00
expected_controlnet_tile.png Add ControlNet Tile e2e test 2024-06-25 09:27:08 +02:00
expected_cutecat_sdxl_ddim_random_init.png Fix references for e2e tests 2024-01-19 15:00:03 +01:00
expected_cutecat_sdxl_ddim_random_init_sag.png Fix references for e2e tests 2024-01-19 15:00:03 +01:00
expected_cutecat_sdxl_euler_random_init.png add a test for SDXL + EulerScheduler (deterministic) 2024-02-01 16:17:07 +01:00
expected_dropy_slime_9752.png Fix references for e2e tests 2024-01-19 15:00:03 +01:00
expected_freeu.png add tests for FreeU 2023-11-18 16:15:44 +01:00
expected_ic_light.png implement foreground conditioned ic light 2024-08-12 12:09:23 +02:00
expected_image_ella_adapter.png ella adapter implementation. tested with sd1.5 model 2024-09-04 11:38:22 +02:00
expected_image_ip_adapter_multi.png update some reference test images 2024-10-01 11:09:57 +02:00
expected_image_ip_adapter_plus_statue.png fix typing issues coming from torch 2.4 version ; typing is not guaranteed for torch < 2.4 2024-08-02 12:02:00 +02:00
expected_image_ip_adapter_woman.png fix typing issues coming from torch 2.4 version ; typing is not guaranteed for torch < 2.4 2024-08-02 12:02:00 +02:00
expected_image_sdxl_ip_adapter_plus_woman.png update some reference test images 2024-10-01 11:09:57 +02:00
expected_image_sdxl_ip_adapter_woman.png update IP adapter test references 2024-06-24 17:19:05 +02:00
expected_inpainting_refonly.png initial commit 2023-08-04 15:28:41 +02:00
expected_ip_adapter_controlnet.png update IP adapter test references 2024-06-24 17:19:05 +02:00
expected_karras_random_init.png add e2e test for sd15 with karras noise schedule 2023-12-04 15:27:06 +01:00
expected_lora_pokemon.png initial commit 2023-08-04 15:28:41 +02:00
expected_multi_diffusion.png add static typing to __call__ method for latent_diffusion models ; fix multi_diffusion bug that wasn't taking guidance_scale into account 2024-04-11 12:13:30 +02:00
expected_multi_diffusion_dpm.png improve/add MultiDiffusion and MultiUpscaler e2e tests 2024-07-11 15:23:02 +02:00
expected_multi_upscaler.png improve/add MultiDiffusion and MultiUpscaler e2e tests 2024-07-11 15:23:02 +02:00
expected_refonly.png initial commit 2023-08-04 15:28:41 +02:00
expected_restart.png implement Restart method for latent diffusion 2023-10-12 15:48:43 +02:00
expected_sdxl_dpo_lora.png fix DPO LoRA loading in tests 2024-03-08 15:43:57 +01:00
expected_sdxl_multi_loras.png fix slider loras test 2024-03-08 15:43:57 +01:00
expected_std_init_image.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_inpainting.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_random_init.png initial commit 2023-08-04 15:28:41 +02:00
expected_std_random_init_bfloat16.png add test_diffusion_std_random_init_bfloat16 e2e test 2024-10-03 11:05:09 +02:00
expected_std_random_init_euler.png add end-to-end test for euler scheduler 2024-01-10 16:53:06 +01:00
expected_std_random_init_sag.png add support for self-attention guidance 2023-10-09 17:33:15 +02:00
expected_std_sde_karras_random_init.png add karras sigmas to dpm solver 2024-09-06 15:30:02 +02:00
expected_std_sde_random_init.png Add stochastic sampling to DPM solver (SDE) 2024-07-23 11:13:12 +02:00
expected_style_aligned.png write StyleAligned e2e test 2024-02-15 15:22:47 +01:00
expected_t2i_adapter_depth.png add e2e test for T2I-Adapter depth 2023-09-25 13:54:26 +02:00
expected_t2i_adapter_xl_canny.png Fix references for e2e tests 2024-01-19 15:00:03 +01:00
expected_textual_inversion_random_init.png Add support for learned concepts e.g. via textual inversion 2023-08-28 10:37:39 +02:00
fairy_guide_canny.png add e2e test for T2I-Adapter XL canny 2023-09-25 13:54:26 +02:00
inpainting-mask.png initial commit 2023-08-04 15:28:41 +02:00
inpainting-scene.png initial commit 2023-08-04 15:28:41 +02:00
inpainting-target.png initial commit 2023-08-04 15:28:41 +02:00
kitchen_dog.png initial commit 2023-08-04 15:28:41 +02:00
kitchen_dog_mask.png initial commit 2023-08-04 15:28:41 +02:00
low_res_dog.png Add ControlNet Tile e2e test 2024-06-25 09:27:08 +02:00
README.md add karras sigmas to dpm solver 2024-09-06 15:30:02 +02:00
statue.png add IP-Adapter plus (aka fine-grained features) 2023-09-29 15:23:43 +02:00
woman.png add IP-Adapter support for SD 1.5 2023-09-06 15:12:48 +02:00

Note about this data

Expected outputs

expected_*.png files are the output of the same diffusion run with a different codebase, usually diffusers with the same settings as us (DPMSolverMultistepScheduler, VAE patched to remove randomness, same seed...).

For instance here is how we generate expected_std_random_init.png:

import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "a cute cat, detailed high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("std_random_init_expected.png")

Special cases:

  • For self-attention guidance, StableDiffusionSAGPipeline has been used instead of the default pipeline.
  • expected_refonly.png has been generated with Stable Diffusion web UI.
  • The following references have been generated with refiners itself (and inspected so that they look reasonable):
    • expected_karras_random_init.png,
    • expected_inpainting_refonly.png,
    • expected_image_ip_adapter_woman.png,
    • expected_image_sdxl_ip_adapter_woman.png
    • expected_ip_adapter_controlnet.png
    • expected_t2i_adapter_xl_canny.png
    • expected_image_sdxl_ip_adapter_plus_woman.png
    • expected_cutecat_sdxl_ddim_random_init_sag.png
    • expected_cutecat_sdxl_euler_random_init.png
    • expected_restart.png
    • expected_freeu.png
    • expected_dropy_slime_9752.png
    • expected_sdxl_dpo_lora.png
    • expected_sdxl_multi_loras.png
    • expected_image_ip_adapter_multi.png
    • expected_controllora_CPDS.png
    • expected_controllora_PyraCanny.png
    • expected_controllora_PyraCanny+CPDS.png
    • expected_controllora_disabled.png
    • expected_style_aligned.png
    • expected_controlnet_canny_scale_decay.png
    • expected_multi_diffusion_dpm.png
    • expected_multi_upscaler.png
    • expected_ic_light.png

Other images

  • cutecat_init.png is generated with the same Diffusers script and prompt but with seed 1234.

  • kitchen_dog.png is generated with the same Diffusers script and negative prompt, seed 12, positive prompt "a small brown dog, detailed high-quality professional image, sitting on a chair, in a kitchen".

  • expected_std_sde_random_init.png is generated with the following code:

import torch
from diffusers import StableDiffusionPipeline
from diffusers.schedulers.scheduling_dpmsolver_multistep import DPMSolverMultistepScheduler

from refiners.fluxion.utils import manual_seed

diffusers_solver = DPMSolverMultistepScheduler.from_config(  # type: ignore
    {
        "beta_end": 0.012,
        "beta_schedule": "scaled_linear",
        "beta_start": 0.00085,
        "algorithm_type": "sde-dpmsolver++",
        "use_karras_sigmas": False,
        "final_sigmas_type": "sigma_min",
        "euler_at_final": True,
    }
)
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32, scheduler=diffusers_solver)
pipe = pipe.to("cuda")
prompt = "a cute cat, detailed high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"
manual_seed(2)
image = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=7.5).images[0]
  • expected_std_sde_karras_random_init.png is generated with the following code (diffusers 0.30.2):
import torch
from diffusers import StableDiffusionPipeline
from diffusers.schedulers.scheduling_dpmsolver_multistep import DPMSolverMultistepScheduler
from refiners.fluxion.utils import manual_seed

model_id = "botp/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
pipe = pipe.to("cuda:1")

config = {**pipe.scheduler.config}
config["use_karras_sigmas"] = True
config["algorithm_type"] = "sde-dpmsolver++"
pipe.scheduler = DPMSolverMultistepScheduler.from_config(config)

prompt = "a cute cat, detailed high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"
manual_seed(2)
image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=18, guidance_scale=7.5).images[0]

(x): excepted fairy_guide_canny.png which comes from TencentARC/t2i-adapter-canny-sdxl-1.0

  • cyberpunk_guide.png comes from Lexica.

  • inpainting-mask.png, inpainting-scene.png and inpainting-target.png have been generated as follows:

    • inpainting-mask.png: negated version of a mask computed with SAM automatic mask generation using the vit_h checkpoint
    • inpainting-scene.png: cropped-to-square-and-resized version of https://unsplash.com/photos/RCz6eSVPGYU by @jannerboy62
    • inpainting-target.png: computed with convert <(convert -size 512x512 xc:white png:-) kitchen_dog.png <(convert inpainting-mask.png -negate png:-) -compose Over -composite inpainting-target.png
  • woman.png comes from tencent-ailab/IP-Adapter.

  • statue.png comes from tencent-ailab/IP-Adapter.

  • cutecat_guide_PyraCanny.png and cutecat_guide_CPDS.png were generated inside Fooocus.

  • low_res_dog.png and expected_controlnet_tile.png are taken from Diffusers documentation, respectively named original.png and output.png.

  • clarity_input_example.png is taken from the Replicate demo of the Clarity upscaler.

VAE without randomness

--- a/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
+++ b/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
@@ -524,13 +524,8 @@ class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
                 f" size of {batch_size}. Make sure the batch size matches the length of the generators."
             )

-        if isinstance(generator, list):
-            init_latents = [
-                self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
-            ]
-            init_latents = torch.cat(init_latents, dim=0)
-        else:
-            init_latents = self.vae.encode(image).latent_dist.sample(generator)
+        init_latents = [self.vae.encode(image[i : i + 1]).latent_dist.mean for i in range(batch_size)]
+        init_latents = torch.cat(init_latents, dim=0)

         init_latents = self.vae.config.scaling_factor * init_latents

Textual Inversion

  • expected_textual_inversion_random_init.png has been generated with StableDiffusionPipeline, e.g.:
import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.load_textual_inversion("sd-concepts-library/gta5-artwork")

prompt = "a cute cat on a <gta5-artwork>"
negative_prompt = ""

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("expected_textual_inversion_random_init.png")