mirror of https://github.com/finegrain-ai/refiners.git synced 2024-11-24 15:18:46 +00:00

History

limiteinductive c9e973ba41 refactor CrossAttentionAdapter to work with context.		2024-01-08 15:20:23 +01:00
..
cutecat_guide_canny.png	initial commit	2023-08-04 15:28:41 +02:00
cutecat_guide_depth.png	initial commit	2023-08-04 15:28:41 +02:00
cutecat_guide_lineart.png	initial commit	2023-08-04 15:28:41 +02:00
cutecat_guide_normals.png	initial commit	2023-08-04 15:28:41 +02:00
cutecat_guide_sam.png	initial commit	2023-08-04 15:28:41 +02:00
cutecat_init.png	initial commit	2023-08-04 15:28:41 +02:00
cyberpunk_guide.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_canny.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_depth.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_lineart.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_normals.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_sam.png	initial commit	2023-08-04 15:28:41 +02:00
expected_controlnet_stack.png	make high-level adapters Adapters	2023-08-31 10:57:18 +02:00
expected_cutecat_sdxl_ddim_random_init.png	sdxl test: refreshed reference image	2023-09-12 10:59:26 +02:00
expected_cutecat_sdxl_ddim_random_init_sag.png	add support for self-attention guidance	2023-10-09 17:33:15 +02:00
expected_freeu.png	add tests for FreeU	2023-11-18 16:15:44 +01:00
expected_image_ip_adapter_plus_statue.png	add IP-Adapter plus (aka fine-grained features)	2023-09-29 15:23:43 +02:00
expected_image_ip_adapter_woman.png	add IP-Adapter support for SD 1.5	2023-09-06 15:12:48 +02:00
expected_image_sdxl_ip_adapter_plus_woman.png	tests: update ref image for SDXL IP-Adapter plus	2023-10-10 14:19:47 +02:00
expected_image_sdxl_ip_adapter_woman.png	add support for SDXL IP-Adapter	2023-09-12 18:00:39 +02:00
expected_inpainting_refonly.png	initial commit	2023-08-04 15:28:41 +02:00
expected_ip_adapter_controlnet.png	refactor CrossAttentionAdapter to work with context.	2024-01-08 15:20:23 +01:00
expected_karras_random_init.png	add e2e test for sd15 with karras noise schedule	2023-12-04 15:27:06 +01:00
expected_lora_pokemon.png	initial commit	2023-08-04 15:28:41 +02:00
expected_multi_diffusion.png	add unit test for multi_diffusion	2023-09-19 15:30:50 +02:00
expected_refonly.png	initial commit	2023-08-04 15:28:41 +02:00
expected_restart.png	implement Restart method for latent diffusion	2023-10-12 15:48:43 +02:00
expected_std_init_image.png	initial commit	2023-08-04 15:28:41 +02:00
expected_std_inpainting.png	initial commit	2023-08-04 15:28:41 +02:00
expected_std_random_init.png	initial commit	2023-08-04 15:28:41 +02:00
expected_std_random_init_sag.png	add support for self-attention guidance	2023-10-09 17:33:15 +02:00
expected_t2i_adapter_depth.png	add e2e test for T2I-Adapter depth	2023-09-25 13:54:26 +02:00
expected_t2i_adapter_xl_canny.png	add e2e test for T2I-Adapter XL canny	2023-09-25 13:54:26 +02:00
expected_textual_inversion_random_init.png	Add support for learned concepts e.g. via textual inversion	2023-08-28 10:37:39 +02:00
fairy_guide_canny.png	add e2e test for T2I-Adapter XL canny	2023-09-25 13:54:26 +02:00
inpainting-mask.png	initial commit	2023-08-04 15:28:41 +02:00
inpainting-scene.png	initial commit	2023-08-04 15:28:41 +02:00
inpainting-target.png	initial commit	2023-08-04 15:28:41 +02:00
kitchen_dog.png	initial commit	2023-08-04 15:28:41 +02:00
kitchen_dog_mask.png	initial commit	2023-08-04 15:28:41 +02:00
README.md	add README bullet point	2023-12-04 15:27:06 +01:00
statue.png	add IP-Adapter plus (aka fine-grained features)	2023-09-29 15:23:43 +02:00
woman.png	add IP-Adapter support for SD 1.5	2023-09-06 15:12:48 +02:00

README.md

Note about this data

Expected outputs

expected_*.png files are the output of the same diffusion run with a different codebase, usually diffusers with the same settings as us (DPMSolverMultistepScheduler, VAE patched to remove randomness, same seed...).

For instance here is how we generate expected_std_random_init.png:

import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "a cute cat, detailed high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("std_random_init_expected.png")

Special cases:

For self-attention guidance, StableDiffusionSAGPipeline has been used instead of the default pipeline.
expected_refonly.png has been generated with Stable Diffusion web UI.
The following references have been generated with refiners itself (and inspected so that they look reasonable):
- expected_karras_random_init.png,
- expected_inpainting_refonly.png,
- expected_image_ip_adapter_woman.png,
- expected_image_sdxl_ip_adapter_woman.png
- expected_ip_adapter_controlnet.png
- expected_t2i_adapter_xl_canny.png
- expected_image_sdxl_ip_adapter_plus_woman.png
- expected_cutecat_sdxl_ddim_random_init_sag.png
- expected_restart.png
- expected_freeu.png

Other images

cutecat_init.png is generated with the same Diffusers script and prompt but with seed 1234.
kitchen_dog.png is generated with the same Diffusers script and negative prompt, seed 12, positive prompt "a small brown dog, detailed high-quality professional image, sitting on a chair, in a kitchen".
kitchen_mask.png is made manually.
Controlnet guides have been manually generated (x) using open source software and models, namely:
- Canny: opencv-python
- Depth: https://github.com/isl-org/ZoeDepth
- Lineart: https://github.com/lllyasviel/ControlNet-v1-1-nightly/tree/main/annotator/lineart
- Normals: https://github.com/baegwangbin/surface_normal_uncertainty/tree/fe2b9f1
- SAM: https://huggingface.co/spaces/mfidabel/controlnet-segment-anything

(x): excepted fairy_guide_canny.png which comes from TencentARC/t2i-adapter-canny-sdxl-1.0

cyberpunk_guide.png comes from Lexica.
inpainting-mask.png, inpainting-scene.png and inpainting-target.png have been generated as follows:
- inpainting-mask.png: negated version of a mask computed with SAM automatic mask generation using the vit_h checkpoint
- inpainting-scene.png: cropped-to-square-and-resized version of https://unsplash.com/photos/RCz6eSVPGYU by @jannerboy62
- inpainting-target.png: computed with convert <(convert -size 512x512 xc:white png:-) kitchen_dog.png <(convert inpainting-mask.png -negate png:-) -compose Over -composite inpainting-target.png
woman.png comes from tencent-ailab/IP-Adapter.
statue.png comes from tencent-ailab/IP-Adapter.

VAE without randomness

--- a/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
+++ b/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
@@ -524,13 +524,8 @@ class StableDiffusionImg2ImgPipeline(DiffusionPipeline):
                 f" size of {batch_size}. Make sure the batch size matches the length of the generators."
             )

-        if isinstance(generator, list):
-            init_latents = [
-                self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size)
-            ]
-            init_latents = torch.cat(init_latents, dim=0)
-        else:
-            init_latents = self.vae.encode(image).latent_dist.sample(generator)
+        init_latents = [self.vae.encode(image[i : i + 1]).latent_dist.mean for i in range(batch_size)]
+        init_latents = torch.cat(init_latents, dim=0)

         init_latents = self.vae.config.scaling_factor * init_latents

Textual Inversion

expected_textual_inversion_random_init.png has been generated with StableDiffusionPipeline, e.g.:

import torch

from diffusers import DPMSolverMultistepScheduler
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float32,
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.load_textual_inversion("sd-concepts-library/gta5-artwork")

prompt = "a cute cat on a <gta5-artwork>"
negative_prompt = ""

torch.manual_seed(2)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
)

output.images[0].save("expected_textual_inversion_random_init.png")