Latent Diffusion
FixedGroupNorm
¶
FixedGroupNorm(target: GroupNorm)
Bases: Chain
, Adapter[GroupNorm]
Adapter for GroupNorm layers to fix the running mean and variance.
This is useful when running tiled inference with a autoencoder to ensure that the statistics of the GroupNorm layers are consistent across tiles.
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
LatentDiffusionAutoencoder
¶
Bases: Chain
Latent diffusion autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
decode
¶
encode
¶
images_to_latents
¶
Convert a list of images to latents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
list[Image]
|
The list of images to convert. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
A tensor containing the latents associated with the images. |
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
latents_to_image
¶
latents_to_image(x: Tensor) -> Image
Decode latents to an image.
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
latents_to_images
¶
tiled_image_to_latents
¶
tiled_image_to_latents(image: Image) -> Tensor
Convert an image to latents with gradient blending to smooth tile edges.
You need to activate the tiled inference context manager with the tiled_inference
method to use this method.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024)): latents = lda.tiled_image_to_latents(sample_image)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
tiled_inference
¶
tiled_inference(
image: Image,
tile_size: tuple[int, int] = (512, 512),
blending: int = 64,
) -> Generator[None, None, None]
Context manager for tiled inference operations to save VRAM for large images.
This context manager sets up a consistent GroupNorm statistics for performing tiled operations on the autoencoder, including setting and resetting group norm statistics. This allow to make sure that the result is consistent across tiles by capturing the statistics of the GroupNorm layers on a downsampled version of the image.
Be careful not to use the normal image_to_latents
and latents_to_image
methods while this context manager is
active, as this will fail silently and run the operation without tiling.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024), blending=32): latents = lda.tiled_image_to_latents(sample_image) decoded_image = lda.tiled_latents_to_image(latents)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
tiled_latents_to_image
¶
tiled_latents_to_image(x: Tensor) -> Image
Convert latents to an image with gradient blending to smooth tile edges.
You need to activate the tiled inference context manager with the tiled_inference
method to use this method.
```python with lda.tiled_inference(sample_image, tile_size=(768, 1024)): image = lda.tiled_latents_to_image(latents)
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
LatentDiffusionModel
¶
LatentDiffusionModel(
unet: Chain,
lda: LatentDiffusionAutoencoder,
clip_text_encoder: Chain,
solver: Solver,
classifier_free_guidance: bool = True,
device: device | str = "cpu",
dtype: dtype = float32,
)
Source code in src/refiners/foundationals/latent_diffusion/model.py
init_latents
¶
init_latents(
size: tuple[int, int],
init_image: Image | None = None,
noise: Tensor | None = None,
) -> Tensor
Initialize the latents for the diffusion process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
tuple[int, int]
|
The size of the latent (in pixel space). |
required |
init_image
|
Image | None
|
The image to use as initialization for the latents. |
None
|
noise
|
Tensor | None
|
The noise to add to the latents. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
sample_noise
staticmethod
¶
sample_noise(
size: tuple[int, ...],
device: device | None = None,
dtype: dtype | None = None,
offset_noise: float | None = None,
) -> Tensor
Sample noise from a normal distribution with an optional offset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
tuple[int, ...]
|
The size of the noise tensor. |
required |
device
|
device | None
|
The device to put the noise tensor on. |
None
|
dtype
|
dtype | None
|
The data type of the noise tensor. |
None
|
offset_noise
|
float | None
|
The offset of the noise tensor. Useful at training time, see https://www.crosslabs.org/blog/diffusion-with-offset-noise. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
set_inference_steps
¶
Set the steps of the diffusion process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_steps
|
int
|
The number of inference steps. |
required |
first_step
|
int
|
The first inference step, used for image-to-image diffusion.
You may be used to setting a float in |
0
|
Source code in src/refiners/foundationals/latent_diffusion/model.py
ControlLora
¶
Bases: Passthrough
ControlLora is a Half-UNet clone of the target UNet,
patched with various LoRA
layers, ZeroConvolution
layers, and a ConditionEncoder
.
Like ControlNet, it injects residual tensors into the target UNet. See https://github.com/HighCWu/control-lora-v2 for more details.
Gets context:
Type | Description |
---|---|
Float[Tensor, 'batch condition_channels width height']
|
The input image. |
Sets context:
Type | Description |
---|---|
list[Tensor]
|
The residuals to be added to the target UNet's residuals. (context="unet", key="residuals") |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the ControlLora. |
required |
unet
|
SDXLUNet
|
The target UNet. |
required |
scale
|
float
|
The scale to multiply the residuals by. |
1.0
|
condition_channels
|
int
|
The number of channels of the input condition tensor. |
3
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
ControlLoraAdapter
¶
ControlLoraAdapter(
name: str,
target: SDXLUNet,
scale: float = 1.0,
condition_channels: int = 3,
weights: dict[str, Tensor] | None = None,
)
Bases: Chain
, Adapter[SDXLUNet]
Adapter for ControlLora
.
This adapter simply prepends a ControlLora
model inside the target SDXLUNet
.
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_condition_encoder
staticmethod
¶
load_condition_encoder(
state_dict: dict[str, Tensor], control_lora: ControlLora
)
Load the ConditionEncoder
's layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the ConditionEncoder layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the ConditionEncoder layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_lora_layers
staticmethod
¶
load_lora_layers(
name: str,
state_dict: dict[str, Tensor],
control_lora: ControlLora,
) -> None
Load the LoRA
layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the ControlLora. |
required |
state_dict
|
dict[str, Tensor]
|
The state_dict containing the LoRA layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the LoRA layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_weights
¶
Load the weights from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the weights to load. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
load_zero_convolution_layers
staticmethod
¶
load_zero_convolution_layers(
state_dict: dict[str, Tensor], control_lora: ControlLora
)
Load the ZeroConvolution
layers from the state_dict into the ControlLora
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
dict[str, Tensor]
|
The state_dict containing the ZeroConvolution layers to load. |
required |
control_lora
|
ControlLora
|
The ControlLora to load the ZeroConvolution layers into. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/control_lora.py
SDXLAutoencoder
¶
Bases: LatentDiffusionAutoencoder
Stable Diffusion XL autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
float
|
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
SDXLIPAdapter
¶
SDXLIPAdapter(
target: SDXLUNet,
clip_image_encoder: CLIPImageEncoderH | None = None,
image_proj: (
ImageProjection | PerceiverResampler | None
) = None,
scale: float = 1.0,
fine_grained: bool = False,
weights: dict[str, Tensor] | None = None,
)
Image Prompt adapter for the Stable Diffusion XL U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SDXLUNet
|
The SDXLUNet model to adapt. |
required |
clip_image_encoder
|
CLIPImageEncoderH | None
|
The CLIP image encoder to use. |
None
|
image_proj
|
ImageProjection | PerceiverResampler | None
|
The image projection to use. |
None
|
scale
|
float
|
The scale to use for the image prompt. |
1.0
|
fine_grained
|
bool
|
Whether to use fine-grained image prompt. |
False
|
weights
|
dict[str, Tensor] | None
|
The weights of the IPAdapter. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/image_prompt.py
SDXLLcmAdapter
¶
SDXLLcmAdapter(
target: SDXLUNet,
condition_scale_embedding_dim: int = 256,
condition_scale: float = 7.5,
)
Bases: Chain
, Adapter[SDXLUNet]
Note that LCM must be used without CFG. You can disable CFG on SD by setting the
classifier_free_guidance
attribute to False
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SDXLUNet
|
A SDXL UNet. |
required |
condition_scale_embedding_dim
|
int
|
LCM uses a condition scale embedding, this is its dimension. |
256
|
condition_scale
|
float
|
Because of the embedding, the condition scale must be passed to this adapter instead of SD. The condition scale passed to SD will be ignored. |
7.5
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/lcm.py
SDXLUNet
¶
Bases: Chain
Stable Diffusion XL U-Net.
See [arXiv:2307.01952] SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels
|
int
|
Number of input channels. |
required |
device
|
device | str | None
|
Device to use for computation. |
None
|
dtype
|
dtype | None
|
Data type to use for computation. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_clip_text_embedding
¶
set_clip_text_embedding(
clip_text_embedding: Tensor,
) -> None
Set the clip text embedding context.
Note
This context is required by the SDXLCrossAttention
blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip_text_embedding
|
Tensor
|
The CLIP text embedding tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_pooled_text_embedding
¶
set_pooled_text_embedding(
pooled_text_embedding: Tensor,
) -> None
Set the pooled text embedding context.
Note
This is required by TextTimeEmbedding
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pooled_text_embedding
|
Tensor
|
The pooled text embedding tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
set_time_ids
¶
set_time_ids(time_ids: Tensor) -> None
Set the time IDs context.
Note
This is required by TextTimeEmbedding
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
time_ids
|
Tensor
|
The time IDs tensor. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/unet.py
StableDiffusion_XL
¶
StableDiffusion_XL(
unet: SDXLUNet | None = None,
lda: SDXLAutoencoder | None = None,
clip_text_encoder: DoubleTextEncoder | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: LatentDiffusionModel
Stable Diffusion XL model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
SDXLUNet
|
The U-Net model. |
clip_text_encoder |
DoubleTextEncoder
|
The text encoder. |
lda |
SDXLAutoencoder
|
The image autoencoder. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
SDXLUNet | None
|
The SDXLUNet U-Net model to use. |
None
|
lda
|
SDXLAutoencoder | None
|
The SDXLAutoencoder image autoencoder to use. |
None
|
clip_text_encoder
|
DoubleTextEncoder | None
|
The DoubleTextEncoder text encoder to use. |
None
|
solver
|
Solver | None
|
The solver to use. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
compute_clip_text_embedding
¶
compute_clip_text_embedding(
text: str | list[str],
negative_text: str | list[str] = "",
) -> tuple[Tensor, Tensor]
Compute the CLIP text embedding associated with the given prompt and negative prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str | list[str]
|
The prompt to compute the CLIP text embedding of. |
required |
negative_text
|
str | list[str]
|
The negative prompt to compute the CLIP text embedding of.
If not provided, the negative prompt is assumed to be empty (i.e., |
''
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
pooled_text_embedding: Tensor,
time_ids: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
pooled_text_embedding
|
Tensor
|
The pooled CLIP text embedding to compute the self-attention guidance with. |
required |
time_ids
|
Tensor
|
The time IDs to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
has_self_attention_guidance
¶
has_self_attention_guidance() -> bool
Whether the model has self-attention guidance or not.
set_self_attention_guidance
¶
Sets the self-attention guidance.
See [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
enable
|
bool
|
Whether to enable self-attention guidance or not. |
required |
scale
|
float
|
The scale to use. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
set_unet_context
¶
set_unet_context(
*,
timestep: Tensor,
clip_text_embedding: Tensor,
pooled_text_embedding: Tensor,
time_ids: Tensor,
**_: Tensor
) -> None
Set the various context parameters required by the U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestep
|
Tensor
|
The timestep to set. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to set. |
required |
pooled_text_embedding
|
Tensor
|
The pooled CLIP text embedding to set. |
required |
time_ids
|
Tensor
|
The time IDs to set. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/model.py
add_lcm_lora
¶
add_lcm_lora(
manager: SDLoraManager,
tensors: dict[str, Tensor],
name: str = "lcm",
scale: float = 8.0 / 64.0,
check_validity: bool = True,
) -> None
Add a LCM-LoRA or a LoRA with similar structure such as SDXL-Lightning to SDXLUNet.
This is a complex LoRA so SDLoraManager.add_loras() is not enough. Instead, we add the LoRAs to the UNet in several iterations, using the filtering mechanism of auto_attach_loras.
LCM-LoRA can be used with or without CFG in SD. If you use CFG, typical values range from 1.0 (same as no CFG) to 2.0.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
manager
|
SDLoraManager
|
A SDLoraManager for SDXL. |
required |
tensors
|
dict[str, Tensor]
|
The |
required |
name
|
str
|
The name of the LoRA. |
'lcm'
|
scale
|
float
|
The scale to use for the LoRA (should generally not be changed, those LoRAs must use alpha / rank). |
8.0 / 64.0
|
check_validity
|
bool
|
Perform additional checks, raise an exception if they fail. |
True
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_xl/lcm_lora.py
ICLight
¶
ICLight(
patch_weights: dict[str, Tensor],
unet: SD1UNet,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: StableDiffusion_1
IC-Light is a Stable Diffusion model that can be used to relight a reference image.
At initialization, the UNet will be patched to accept four additional input channels. Only the text-conditioned relighting model is supported for now.
Example
import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.clip import CLIPTextEncoderL
from refiners.foundationals.latent_diffusion.stable_diffusion_1 import SD1Autoencoder, SD1UNet
from refiners.foundationals.latent_diffusion.stable_diffusion_1.ic_light import ICLight
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.float32
no_grad().__enter__()
manual_seed(42)
sd = ICLight(
patch_weights=load_from_safetensors(
path=hf_hub_download(
repo_id="refiners/ic_light.sd1_5.fc",
filename="model.safetensors",
),
device=device,
),
unet=SD1UNet(in_channels=4, device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.unet",
filename="model.safetensors",
)
),
clip_text_encoder=CLIPTextEncoderL(device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.text_encoder",
filename="model.safetensors",
)
),
lda=SD1Autoencoder(device=device, dtype=dtype).load_from_safetensors(
tensors_path=hf_hub_download(
repo_id="refiners/realistic_vision.v5_1.sd1_5.autoencoder",
filename="model.safetensors",
)
),
device=device,
dtype=dtype,
)
prompt = "soft lighting, high-quality professional image"
negative_prompt = "lowres, bad anatomy, bad hands, cropped, worst quality"
clip_text_embedding = sd.compute_clip_text_embedding(text=prompt, negative_text=negative_prompt)
image = Image.open("reference-image.png").resize((512, 512))
sd.set_ic_light_condition(image)
x = torch.randn(
size=(1, 4, 64, 64),
device=device,
dtype=dtype,
)
for step in sd.steps:
x = sd(
x=x,
step=step,
clip_text_embedding=clip_text_embedding,
condition_scale=1.5,
)
predicted_image = sd.lda.latents_to_image(x)
predicted_image.save("ic-light-output.png")
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
compute_gray_composite
staticmethod
¶
Compute a grayscale composite of an image and a mask.
IC-Light will recreate the image
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
Image
|
The image to composite. |
required |
mask
|
Image
|
The mask to use for the composite. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
set_ic_light_condition
¶
Set the IC light condition.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
Image
|
The reference image. |
required |
mask
|
Image | None
|
The mask to use for the reference image. |
None
|
If a mask is provided, it will be used to compute a grayscale composite of the image and the mask ; otherwise, the image will be used as is, but note that IC-Light requires a 127-valued gray background to work.
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ic_light.py
SD1Autoencoder
¶
Bases: LatentDiffusionAutoencoder
Stable Diffusion 1.5 autoencoder model.
Attributes:
Name | Type | Description |
---|---|---|
encoder_scale |
float
|
The encoder scale to use. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to use. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to use. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/auto_encoder.py
SD1ELLAAdapter
¶
Bases: ELLAAdapter[SD1UNet]
ELLA
adapter for Stable Diffusion 1.5.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
SD1UNet
|
The target model to adapt. |
required |
weights
|
dict[str, Tensor] | None
|
The weights of the ELLA adapter (see |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/ella_adapter.py
SD1UNet
¶
Bases: Chain
Stable Diffusion 1.5 U-Net.
See [arXiv:2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_channels
|
int
|
The number of input channels. |
required |
device
|
device | str | None
|
The PyTorch device to use for computation. |
None
|
dtype
|
dtype | None
|
The PyTorch dtype to use for computation. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/unet.py
set_clip_text_embedding
¶
set_clip_text_embedding(
clip_text_embedding: Tensor,
) -> None
Set the CLIP text embedding.
Note
This context is required by the CLIPLCrossAttention
blocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clip_text_embedding
|
Tensor
|
The CLIP text embedding. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/unet.py
StableDiffusion_1
¶
StableDiffusion_1(
unet: SD1UNet | None = None,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: LatentDiffusionModel
Stable Diffusion 1.5 model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
SD1UNet
|
The U-Net model. |
clip_text_encoder |
CLIPTextEncoderL
|
The text encoder. |
lda |
SD1Autoencoder
|
The image autoencoder. |
Example:
import torch
from refiners.fluxion.utils import manual_seed, no_grad
from refiners.foundationals.latent_diffusion.stable_diffusion_1 import StableDiffusion_1
# Load SD
sd15 = StableDiffusion_1(device="cuda", dtype=torch.float16)
sd15.clip_text_encoder.load_from_safetensors("sd1_5.text_encoder.safetensors")
sd15.unet.load_from_safetensors("sd1_5.unet.safetensors")
sd15.lda.load_from_safetensors("sd1_5.autoencoder.safetensors")
# Hyperparameters
prompt = "a cute cat, best quality, high quality"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
seed = 42
sd15.set_inference_steps(50)
with no_grad(): # Disable gradient calculation for memory-efficient inference
clip_text_embedding = sd15.compute_clip_text_embedding(text=prompt, negative_text=negative_prompt)
manual_seed(seed)
x = sd15.init_latents((512, 512)).to(sd15.device, sd15.dtype)
# Diffusion process
for step in sd15.steps:
x = sd15(x, step=step, clip_text_embedding=clip_text_embedding)
predicted_image = sd15.lda.latents_to_image(x)
predicted_image.save("output.png")
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unet
|
SD1UNet | None
|
The SD1UNet U-Net model to use. |
None
|
lda
|
SD1Autoencoder | None
|
The SD1Autoencoder image autoencoder to use. |
None
|
clip_text_encoder
|
CLIPTextEncoderL | None
|
The CLIPTextEncoderL text encoder to use. |
None
|
solver
|
Solver | None
|
The solver to use. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_clip_text_embedding
¶
compute_clip_text_embedding(
text: str | list[str],
negative_text: str | list[str] = "",
) -> Tensor
Compute the CLIP text embedding associated with the given prompt and negative prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str | list[str]
|
The prompt to compute the CLIP text embedding of. |
required |
negative_text
|
str | list[str]
|
The negative prompt to compute the CLIP text embedding of.
If not provided, the negative prompt is assumed to be empty (i.e., |
''
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
has_self_attention_guidance
¶
has_self_attention_guidance() -> bool
Whether the model has self-attention guidance or not.
set_self_attention_guidance
¶
Set whether to enable self-attention guidance.
See [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
enable
|
bool
|
Whether to enable self-attention guidance. |
required |
scale
|
float
|
The scale to use. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
set_unet_context
¶
Set the various context parameters required by the U-Net model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timestep
|
Tensor
|
The timestep tensor to use. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding tensor to use. |
required |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
StableDiffusion_1_Inpainting
¶
StableDiffusion_1_Inpainting(
unet: SD1UNet | None = None,
lda: SD1Autoencoder | None = None,
clip_text_encoder: CLIPTextEncoderL | None = None,
solver: Solver | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: StableDiffusion_1
Stable Diffusion 1.5 inpainting model.
Attributes:
Name | Type | Description |
---|---|---|
unet |
The U-Net model. |
|
clip_text_encoder |
The text encoder. |
|
lda |
The image autoencoder. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
compute_self_attention_guidance
¶
compute_self_attention_guidance(
x: Tensor,
noise: Tensor,
step: int,
*,
clip_text_embedding: Tensor,
**kwargs: Tensor
) -> Tensor
Compute the self-attention guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor. |
required |
noise
|
Tensor
|
The noise tensor. |
required |
step
|
int
|
The step to compute the self-attention guidance at. |
required |
clip_text_embedding
|
Tensor
|
The CLIP text embedding to compute the self-attention guidance with. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The computed self-attention guidance. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
set_inpainting_conditions
¶
set_inpainting_conditions(
target_image: Image,
mask: Image,
latents_size: tuple[int, int] = (64, 64),
) -> tuple[Tensor, Tensor]
Set the inpainting conditions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_image
|
Image
|
The target image to inpaint. |
required |
mask
|
Image
|
The mask to use for inpainting. |
required |
latents_size
|
tuple[int, int]
|
The size of the latents to use. |
(64, 64)
|
Returns:
Type | Description |
---|---|
tuple[Tensor, Tensor]
|
The mask latents and the target image latents. |
Source code in src/refiners/foundationals/latent_diffusion/stable_diffusion_1/model.py
DDIM
¶
DDIM(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Denoising Diffusion Implicit Model (DDIM) solver.
See [arXiv:2010.02502] Denoising Diffusion Implicit Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/ddim.py
DDPM
¶
DDPM(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
)
Bases: Solver
Denoising Diffusion Probabilistic Model (DDPM) solver.
Warning
Only used for training Latent Diffusion models. Cannot be called.
See [arXiv:2006.11239] Denoising Diffusion Probabilistic Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/ddpm.py
DPMSolver
¶
DPMSolver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
last_step_first_order: bool = False,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Diffusion probabilistic models (DPMs) solver.
See [arXiv:2211.01095] DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models for more details.
Note
Regarding last_step_first_order: DPM-Solver++ is known to introduce artifacts when used with SDXL and few steps. This parameter is a way to mitigate that effect by using a first-order (Euler) update instead of a second-order update for the last step of the diffusion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
last_step_first_order
|
bool
|
Use a first-order update for the last step. |
False
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
dpm_solver_first_order_update
¶
dpm_solver_first_order_update(
x: Tensor,
noise: Tensor,
step: int,
sde_noise: Tensor | None = None,
) -> Tensor
Applies a first-order backward Euler update to the input data x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input data. |
required |
noise
|
Tensor
|
The predicted noise. |
required |
step
|
int
|
The current step. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised version of the input data |
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
multistep_dpm_solver_second_order_update
¶
multistep_dpm_solver_second_order_update(
x: Tensor, step: int, sde_noise: Tensor | None = None
) -> Tensor
Applies a second-order backward Euler update to the input data x
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input data. |
required |
step
|
int
|
The current step. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised version of the input data |
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
rebuild
¶
Rebuilds the solver with new parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int | None
|
The number of inference steps. |
required |
first_inference_step
|
int | None
|
The first inference step. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
remove_noise
¶
Remove noise from the input tensor using the current step of the diffusion process.
See Solver.remove_noise
for more details.
Source code in src/refiners/foundationals/latent_diffusion/solvers/dpm.py
Euler
¶
Euler(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Euler solver.
See [arXiv:2206.00364] Elucidating the Design Space of Diffusion-Based Generative Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/euler.py
scale_model_input
¶
Scales the model input according to the current step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The model input. |
required |
step
|
int
|
The current step. This method is called with |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled model input. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/euler.py
FrankenSolver
¶
FrankenSolver(
get_diffusers_scheduler: Callable[[], SchedulerLike],
num_inference_steps: int,
first_inference_step: int = 0,
device: device | str = "cpu",
dtype: dtype = float32,
**kwargs: Any
)
Bases: Solver
Lets you use Diffusers Schedulers as Refiners Solvers.
For instance
Source code in src/refiners/foundationals/latent_diffusion/solvers/franken.py
LCMSolver
¶
LCMSolver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
num_orig_steps: int = 50,
device: device | str = "cpu",
dtype: dtype = float32,
)
Bases: Solver
Latent Consistency Model solver.
This solver is designed for use either with a specific base model or a specific LoRA.
See [arXiv:2310.04378] Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference for details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
num_orig_steps
|
int
|
The number of inference steps of the emulated DPM solver. |
50
|
device
|
device | str
|
The PyTorch device to use. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/lcm.py
ModelPredictionType
¶
NoiseSchedule
¶
An enumeration of schedules used to sample the noise.
Attributes:
Name | Type | Description |
---|---|---|
UNIFORM |
A uniform noise schedule. |
|
QUADRATIC |
A quadratic noise schedule. Corresponds to "Stable Diffusion" in [arXiv:2305.08891] Common Diffusion Noise Schedules and Sample Steps are Flawed table 1. |
|
KARRAS |
Solver
¶
Solver(
num_inference_steps: int,
first_inference_step: int = 0,
params: BaseSolverParams | None = None,
device: device | str = "cpu",
dtype: dtype = float32,
)
The base class for creating a diffusion model solver.
Solvers create a sequence of noise and scaling factors used in the diffusion process, which gradually transforms the original data distribution into a Gaussian one.
This process is described using several parameters such as initial and final diffusion rates,
and is encapsulated into a __call__
method that applies a step of the diffusion process.
Attributes:
Name | Type | Description |
---|---|---|
params |
ResolvedSolverParams
|
The common parameters for solvers. See |
num_inference_steps |
The number of inference steps to perform. |
|
first_inference_step |
The step to start the inference process from. |
|
scale_factors |
The scale factors used to denoise the input. These are called "betas" in other implementations,
and |
|
cumulative_scale_factors |
The cumulative scale factors used to denoise the input. These are called "alpha_t" in other implementations. |
|
noise_std |
The standard deviation of the noise used to denoise the input. This is called "sigma_t" in other implementations. |
|
signal_to_noise_ratios |
The signal-to-noise ratios used to denoise the input. This is called "lambda_t" in other implementations. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
first_inference_step
|
int
|
The first inference step to perform. |
0
|
params
|
BaseSolverParams | None
|
The common parameters for solvers. |
None
|
device
|
device | str
|
The PyTorch device to use for the solver's tensors. |
'cpu'
|
dtype
|
dtype
|
The PyTorch data type to use for the solver's tensors. |
float32
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
add_noise
¶
Add noise to the input tensor using the solver's parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to add noise to. |
required |
noise
|
Tensor
|
The noise tensor to add to the input tensor. |
required |
step
|
int | list[int]
|
The current step(s) of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The input tensor with added noise. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
generate_timesteps
staticmethod
¶
generate_timesteps(
spacing: TimestepSpacing,
num_inference_steps: int,
num_train_timesteps: int = 1000,
offset: int = 0,
) -> Tensor
Generate a tensor of timesteps according to a given spacing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spacing
|
TimestepSpacing
|
The spacing to use for the timesteps. |
required |
num_inference_steps
|
int
|
The number of inference steps to perform. |
required |
num_train_timesteps
|
int
|
The number of timesteps used to train the diffusion process. |
1000
|
offset
|
int
|
The offset to use for the timesteps. |
0
|
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
rebuild
¶
Rebuild the solver with new parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_inference_steps
|
int | None
|
The number of inference steps to perform. |
required |
first_inference_step
|
int | None
|
The first inference step to perform. |
None
|
Returns:
Type | Description |
---|---|
T
|
A new solver instance with the specified parameters. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
remove_noise
¶
Remove noise from the input tensor using the current step of the diffusion process.
Note
See [arXiv:2006.11239] Denoising Diffusion Probabilistic Models, Equation 15 and [arXiv:2210.00939] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to remove noise from. |
required |
noise
|
Tensor
|
The noise tensor to remove from the input tensor. |
required |
step
|
int
|
The current step of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The denoised input tensor. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
sample_noise_schedule
¶
sample_noise_schedule() -> Tensor
Sample the noise schedule.
Returns:
Type | Description |
---|---|
Tensor
|
A tensor representing the noise schedule. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
sample_power_distribution
¶
Sample a power distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
power
|
float
|
The power to use for the distribution. |
2
|
Returns:
Type | Description |
---|---|
Tensor
|
A tensor representing the power distribution between the initial and final diffusion rates of the solver. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
scale_model_input
¶
Scale the model's input according to the current timestep.
Note
This method should only be overridden by solvers that need to scale the input according to the current timestep.
By default, this method does not scale the input. (scale=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
Tensor
|
The input tensor to scale. |
required |
step
|
int
|
The current step of the diffusion process. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled input tensor. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
to
¶
Move the solver to the specified device and data type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device
|
device | str | None
|
The PyTorch device to move the solver to. |
None
|
dtype
|
dtype | None
|
The PyTorch data type to move the solver to. |
None
|
Returns:
Type | Description |
---|---|
Solver
|
The solver instance, moved to the specified device and data type. |
Source code in src/refiners/foundationals/latent_diffusion/solvers/solver.py
SolverParams
dataclass
¶
SolverParams(
*,
num_train_timesteps: int | None = None,
timesteps_spacing: TimestepSpacing | None = None,
timesteps_offset: int | None = None,
initial_diffusion_rate: float | None = None,
final_diffusion_rate: float | None = None,
noise_schedule: NoiseSchedule | None = None,
sigma_schedule: NoiseSchedule | None = None,
model_prediction_type: (
ModelPredictionType | None
) = None,
sde_variance: float = 0.0
)
Bases: BaseSolverParams
Common parameters for solvers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_train_timesteps
|
int | None
|
The number of timesteps used to train the diffusion process. |
None
|
timesteps_spacing
|
TimestepSpacing | None
|
The spacing to use for the timesteps. |
None
|
timesteps_offset
|
int | None
|
The offset to use for the timesteps. |
None
|
initial_diffusion_rate
|
float | None
|
The initial diffusion rate used to sample the noise schedule. |
None
|
final_diffusion_rate
|
float | None
|
The final diffusion rate used to sample the noise schedule. |
None
|
noise_schedule
|
NoiseSchedule | None
|
The noise schedule used to sample the noise schedule. |
None
|
model_prediction_type
|
ModelPredictionType | None
|
Defines what the model predicts. |
None
|
TimestepSpacing
¶
An enumeration of methods to space the timesteps.
See [arXiv:2305.08891] Common Diffusion Noise Schedules and Sample Steps are Flawed table 2.
Attributes:
Name | Type | Description |
---|---|---|
LINSPACE |
Sample N steps with linear interpolation, return a floating-point tensor. |
|
LINSPACE_ROUNDED |
Same as LINSPACE but return an integer tensor with rounded timesteps. |
|
LEADING |
Sample N+1 steps, do not include the last timestep (i.e. bad - non-zero SNR). Used in DDIM, with a mitigation for that issue. |
|
TRAILING |
Sample N+1 steps, do not include the first timestep. |
|
CUSTOM |
Use custom timespacing in solver (override |
SDLoraManager
¶
SDLoraManager(target: LatentDiffusionModel)
Manage LoRAs for a Stable Diffusion model.
Note
In the context of SDLoraManager, a "LoRA" is a set of "LoRA layers" that can be attached to a target model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
LatentDiffusionModel
|
The target model to manage the LoRAs for. |
required |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
lora_adapters
property
¶
lora_adapters: list[LoraAdapter]
List of all the LoraAdapters managed by the SDLoraManager.
scales
property
¶
The scales of all the LoRAs managed by the SDLoraManager.
add_loras
¶
add_loras(
name: str,
/,
tensors: dict[str, Tensor],
scale: float = 1.0,
unet_inclusions: list[str] | None = None,
unet_exclusions: list[str] | None = None,
unet_preprocess: dict[str, str] | None = None,
text_encoder_inclusions: list[str] | None = None,
text_encoder_exclusions: list[str] | None = None,
) -> None
Load a single LoRA from a state_dict
.
Warning
This method expects the keys of the state_dict
to be in the commonly found formats on CivitAI's hub.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the LoRA. |
required |
tensors
|
dict[str, Tensor]
|
The |
required |
scale
|
float
|
The scale to use for the LoRA. |
1.0
|
unet_inclusions
|
list[str] | None
|
A list of layer names, only layers with such a layer in their ancestors will be considered when patching the UNet. |
None
|
unet_exclusions
|
list[str] | None
|
A list of layer names, layers with such a layer in
their ancestors will not be considered when patching the UNet.
If this is |
None
|
unet_preprocess
|
dict[str, str] | None
|
A map between parts of state dict keys and layer names.
This is used to attach some keys to specific parts of the UNet.
You should leave it set to |
None
|
text_encoder_inclusions
|
list[str] | None
|
A list of layer names, only layers with such a layer in their ancestors will be considered when patching the text encoder. |
None
|
text_encoder_exclusions
|
list[str] | None
|
A list of layer names, layers with such a layer in their ancestors will not be considered when patching the text encoder. |
None
|
Raises:
Type | Description |
---|---|
AssertionError
|
If the Manager already has a LoRA with the same name. |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
add_loras_to_text_encoder
¶
add_loras_to_text_encoder(
loras: dict[str, Lora[Any]],
/,
include: list[str] | None = None,
exclude: list[str] | None = None,
debug_map: list[tuple[str, str]] | None = None,
) -> None
Add multiple LoRAs to the text encoder. See add_loras
for details about arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loras
|
dict[str, Lora[Any]]
|
The dictionary of LoRAs to add to the text encoder. (keys are the names of the LoRAs, values are the LoRAs to add to the text encoder) |
required |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
add_loras_to_unet
¶
add_loras_to_unet(
loras: dict[str, Lora[Any]],
/,
include: list[str] | None = None,
exclude: list[str] | None = None,
preprocess: dict[str, str] | None = None,
debug_map: list[tuple[str, str]] | None = None,
) -> None
Add multiple LoRAs to the U-Net. See add_loras
for details about arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
loras
|
dict[str, Lora[Any]]
|
The dictionary of LoRAs to add to the U-Net. (keys are the names of the LoRAs, values are the LoRAs to add to the U-Net) |
required |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
get_loras_by_name
¶
Get the LoRA layers with the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the LoRA. |
required |
get_scale
¶
Get the scale of the LoRA with the given name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the LoRA. |
required |
Returns:
Type | Description |
---|---|
float
|
The scale of the LoRA layers with the given name. |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
remove_all
¶
remove_loras
¶
remove_loras(*names: str) -> None
Remove multiple LoRAs from the target.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
names
|
str
|
The names of the LoRAs to remove. |
()
|
Source code in src/refiners/foundationals/latent_diffusion/lora.py
set_scale
¶
sort_keys
staticmethod
¶
Compute the score of a key, relatively to its suffix.
When used by sorted
, the keys will only be sorted "at the suffix level".
The idea is that sometimes closely related keys in the state dict are not in the
same order as the one we expect, for instance q -> k -> v
or in -> out
. This
attempts to fix that issue, not cases where distant layers are called in a different
order.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str
|
The key to sort. |
required |
Returns:
Type | Description |
---|---|
str
|
The padded prefix of the key. |
int
|
A score depending on the key's suffix. |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
update_scales
¶
Update the scales of multiple LoRAs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scales
|
dict[str, float]
|
The scales to update. (keys are the names of the LoRAs, values are the new scales to set) |
required |
Source code in src/refiners/foundationals/latent_diffusion/lora.py
IPAdapter
¶
IPAdapter(
target: T,
clip_image_encoder: CLIPImageEncoderH,
image_proj: Module,
scale: float = 1.0,
fine_grained: bool = False,
weights: dict[str, Tensor] | None = None,
)
Bases: Generic[T]
, Chain
, Adapter[T]
Image Prompt adapter for a Stable Diffusion U-Net model.
See [arXiv:2308.06721] IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
T
|
The target model to adapt. |
required |
clip_image_encoder
|
CLIPImageEncoderH
|
The CLIP image encoder to use. |
required |
image_proj
|
Module
|
The image projection to use. |
required |
scale
|
float
|
The scale to use for the image prompt. |
1.0
|
fine_grained
|
bool
|
Whether to use fine-grained image prompt. |
False
|
weights
|
dict[str, Tensor] | None
|
The weights of the IPAdapter. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/image_prompt.py
clip_image_encoder
property
¶
clip_image_encoder: CLIPImageEncoderH
The CLIP image encoder of the adapter.
compute_clip_image_embedding
¶
compute_clip_image_embedding(
image_prompt: Image | list[Image] | Tensor,
weights: list[float] | None = None,
concat_batches: bool = True,
) -> Tensor
Compute CLIP image embeddings from the provided image prompts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_prompt
|
Image | list[Image] | Tensor
|
A single image or a list of images to compute embeddings for. This can be a PIL Image, a list of PIL Images, or a Tensor. |
required |
weights
|
list[float] | None
|
An optional list of scaling factors for the conditional embeddings.
If provided, it must have the same length as the number of images in |
None
|
concat_batches
|
bool
|
Determines how embeddings are concatenated when multiple images are provided:
- If |
True
|
Returns:
Type | Description |
---|---|
Tensor
|
A Tensor containing the CLIP image embeddings. |
Tensor
|
The structure of the returned Tensor depends on the |
Source code in src/refiners/foundationals/latent_diffusion/image_prompt.py
preprocess_image
¶
preprocess_image(
image: Image,
size: tuple[int, int] = (224, 224),
mean: list[float] | None = None,
std: list[float] | None = None,
) -> Tensor
Preprocess the image.
Note
The default mean and std are parameters from https://github.com/openai/CLIP
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image
|
Image
|
The image to preprocess. |
required |
size
|
tuple[int, int]
|
The size to resize the image to. |
(224, 224)
|
mean
|
list[float] | None
|
The mean to use for normalization. |
None
|
std
|
list[float] | None
|
The standard deviation to use for normalization. |
None
|
Source code in src/refiners/foundationals/latent_diffusion/image_prompt.py
set_clip_image_embedding
¶
set_clip_image_embedding(image_embedding: Tensor) -> None
Set the CLIP image embedding context.
Note
This is required by ImageCrossAttention
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_embedding
|
Tensor
|
The CLIP image embedding to set. |
required |
Source code in src/refiners/foundationals/latent_diffusion/image_prompt.py
AdaIN
¶
AdaIN(epsilon: float = 1e-08)
Bases: Module
Apply Adaptive Instance Normalization (AdaIN) to the target features.
See [arXiv:1703.06868] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization for more details.
Receives:
Name | Type | Description |
---|---|---|
reference |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The reference features. |
targets |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The target features. |
Returns:
Name | Type | Description |
---|---|---|
reference |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The reference features (unchanged). |
targets |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The target features, renormalized. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
epsilon
|
float
|
A small value to avoid division by zero. |
1e-08
|
Source code in src/refiners/foundationals/latent_diffusion/style_aligned.py
ExtractReferenceFeatures
¶
Bases: Module
Extract the reference features from the input features.
Note
This layer expects the input features to be a concatenation of conditional and unconditional features, as done when using Classifier-free guidance (CFG).
The reference features are the first features of the conditional and unconditional input features. They are extracted, and repeated to match the batch size of the input features.
Receives:
Name | Type | Description |
---|---|---|
features |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The input features. |
Returns:
Name | Type | Description |
---|---|---|
reference |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The reference features. |
Source code in src/refiners/fluxion/layers/module.py
ScaleReferenceFeatures
¶
ScaleReferenceFeatures(scale: float = 1.0)
Bases: Module
Scale the reference features.
Note
This layer expects the input features to be a concatenation of conditional and unconditional features, as done when using Classifier-free guidance (CFG).
This layer scales the reference features which will later be used (in the attention dot product) with the target features.
Receives:
Name | Type | Description |
---|---|---|
features |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The input reference features. |
Returns:
Name | Type | Description |
---|---|---|
features |
Float[Tensor, 'cfg_batch_size sequence_length embedding_dim']
|
The rescaled reference features. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale
|
float
|
The scaling factor. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/style_aligned.py
SharedSelfAttentionAdapter
¶
SharedSelfAttentionAdapter(
target: SelfAttention, scale: float = 1.0
)
Bases: Chain
, Adapter[SelfAttention]
Upgrades a SelfAttention
layer into a SharedSelfAttention
layer.
This adapter inserts 3 StyleAligned
modules right after
the original Q, K, V Linear
-s (wrapped inside a fl.Distribute
).
Source code in src/refiners/foundationals/latent_diffusion/style_aligned.py
StyleAligned
¶
Bases: Chain
StyleAligned module.
This layer encapsulates the logic of the StyleAligned method, as described in [arXiv:2312.02133] Style Aligned Image Generation via Shared Attention.
See also https://blog.finegrain.ai/posts/implementing-style-aligned/.
Receives:
Name | Type | Description |
---|---|---|
features |
Float[Tensor, 'cfg_batch_size sequence_length_in embedding_dim']
|
The input features. |
Returns:
Name | Type | Description |
---|---|---|
shared_features |
Float[Tensor, 'cfg_batch_size sequence_length_out embedding_dim']
|
The transformed features. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adain
|
bool
|
Whether to apply Adaptive Instance Normalization to the target features. |
required |
scale
|
float
|
The scaling factor for the reference features. |
1.0
|
concatenate
|
bool
|
Whether to concatenate the reference and target features. |
required |
Source code in src/refiners/foundationals/latent_diffusion/style_aligned.py
StyleAlignedAdapter
¶
StyleAlignedAdapter(target: T, scale: float = 1.0)
Bases: Generic[T]
, Chain
, Adapter[T]
Upgrade each SelfAttention
layer of a UNet into a SharedSelfAttention
layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target
|
T
|
The target module. |
required |
scale
|
float
|
The scaling factor for the reference features. |
1.0
|
Source code in src/refiners/foundationals/latent_diffusion/style_aligned.py
DiffusionTarget
dataclass
¶
DiffusionTarget(
*,
tile: Tile,
solver: Solver,
init_latents: Tensor | None = None,
opacity_mask: Tensor | None = None,
weight: int = 1,
start_step: int = 0,
end_step: int = MAX_STEPS
)
Represents a target for the tiled diffusion process.
This class encapsulates the parameters and properties needed to define a specific area (target) within a larger diffusion process, allowing for fine-grained control over different regions of the generated image.
Attributes:
Name | Type | Description |
---|---|---|
tile |
Tile
|
The tile defining the area of the target within the latent image. |
solver |
Solver
|
The solver to use for this target's diffusion process. This is useful because some solvers have an internal state that needs to be updated during the diffusion process. Using the same solver instance for multiple targets would interfere with this internal state. |
init_latents |
Tensor | None
|
The initial latents for this target. If None, the target will be initialized with noise. |
opacity_mask |
Tensor | None
|
Mask controlling the target's visibility in the final image. If None, the target will be fully visible. Otherwise, 1 means fully opaque and 0 means fully transparent which means the target has no influence. |
weight |
int
|
The importance of this target in the final image. Higher values increase the target's influence. |
start_step |
int
|
The diffusion step at which this target begins to influence the process. |
end_step |
int
|
The diffusion step at which this target stops influencing the process. |
size |
Size
|
The size of the target area. |
offset |
tuple[int, int]
|
The top-left offset of the target area within the latent image. |
The combination of opacity_mask
and weight
determines the target's overall contribution to the final generated
image. The solver
is responsible for the actual diffusion calculations for this target.
MultiDiffusion
¶
MultiDiffusion class for performing multi-target diffusion using tiled diffusion.
For more details, refer to the paper: MultiDiffusion
generate_latent_tiles
staticmethod
¶
Generate tiles for a latent image with the given size and tile size.
If one dimension of the tile_size
is larger than the corresponding dimension of the image size, a single tile is
used to cover the entire image - and therefore tile_size
is ignored. This algorithm ensures that the tile size
is respected as much as possible, while still covering the entire image and respecting the minimum overlap.
Source code in src/refiners/foundationals/latent_diffusion/multi_diffusion.py
ELLA
¶
ELLA(
time_channel: int,
timestep_embedding_dim: int,
width: int,
num_layers: int,
num_heads: int,
num_latents: int,
input_dim: int | None = None,
out_dim: int | None = None,
device: device | str | None = None,
dtype: dtype | None = None,
)
Bases: Passthrough
ELLA latents encoder.
See [arXiv:2403.05135] ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment for more details.
Source code in src/refiners/foundationals/latent_diffusion/ella_adapter.py
ELLAAdapter
¶
Bases: Generic[T]
, Chain
, Adapter[T]
Adapter for ELLA
.