Refresh docs, remove references to old conversion scripts

This commit is contained in:
Laurent 2024-10-15 12:46:25 +00:00 committed by Laureηt
parent c5d3d1c657
commit 0620473109
8 changed files with 518 additions and 280 deletions

View file

@ -10,4 +10,6 @@ We use Rye to maintain and release Refiners but it conforms to the standard Pyth
## Using stable releases from PyPI
Although we recommend using our development branch, we do [publish more stable releases to PyPI](https://pypi.org/project/refiners/) and you are welcome to use them in your project. However, note that the format of weights can be different from the current state of the development branch, so you will need the conversion scripts from the corresponding tag in GitHub, for instance [here for v0.2.0](https://github.com/finegrain-ai/refiners/tree/v0.2.0).
Although we recommend using our development branch, we do publish more stable releases to [PyPI](https://pypi.org/project/refiners/) and you are welcome to use them in your project.
They are also available directly on the [GitHub releases page](https://github.com/finegrain-ai/refiners/releases).
However, beware that the format of weights can be different from the current state of the development branch.

View file

@ -6,55 +6,100 @@ icon: material/star-outline
Refiners is still a young project and development is active, so to use the latest and greatest version of the framework we recommend you use the `main` branch from our development repository.
Moreover, we recommend using [Rye](https://rye-up.com) which simplifies several things related to Python package management, so start by following the instructions to install it on your system.
Moreover, we recommend using [Rye](https://rye.astral.sh/) which simplifies several things related to Python package management, so start by following the instructions to install it on your system.
## Installing
To try Refiners, clone the GitHub repository and install it with all optional features:
```bash
git clone "git@github.com:finegrain-ai/refiners.git"
git clone git@github.com:finegrain-ai/refiners.git
cd refiners
rye sync --all-features
```
## Converting weights
The format of state dicts used by Refiners is custom and we do not redistribute model weights, but we provide conversion tools and working scripts for popular models. For instance, let us convert the autoencoder from Stable Diffusion 1.5:
The format of state dicts used by Refiners is custom, so to use pretrained models you will need to convert weights.
We provide conversion tools and pre-converted weights on our [HuggingFace organization](https://huggingface.co/refiners) for popular models.
```bash
python "scripts/conversion/convert_diffusers_autoencoder_kl.py" --to "lda.safetensors"
For instance, to use the autoencoder from Stable Diffusion 1.5:
### Use pre-converted weights
```py
from huggingface_hub import hf_hub_download
from refiners.foundationals.latent_diffusion.stable_diffusion_1.model import SD1Autoencoder
# download the pre-converted weights from the hub
safetensors_path = hf_hub_download(
repo_id="refiners/sd15.autoencoder",
filename="model.safetensors",
revision="9ce6af42e21fce64d74b1cab57a65aea82fd40ea", # optional
)
# initialize the model
model = SD1Autoencoder()
# load the pre-converted weights
model.load_from_safetensors(safetensors_path)
```
If you need to convert weights for all models, check out `script/prepare_test_weights.py`.
### Convert the weights yourself
If you want to convert the weights yourself, you can use the conversion tools we provide.
```py
from refiners.conversion import autoencoder_sd15
# This function will:
# - download the original weights from the internet, and save them to disk at a known location
# (e.g. tests/weights/stable-diffusion-v1-5/stable-diffusion-v1-5/vae/diffusion_pytorch_model.safetensors)
# - convert them to the refiners format, and save them to disk at a known location
# (e.g. tests/weights/refiners/sd15.autoencoder/model.safetensors)
autoencoder_sd15.runwayml.convert()
# get the path to the converted weights
safetensors_path = autoencoder_sd15.runwayml.converted.local_path
# initialize the model
model = SD1Autoencoder()
# load the converted weights
model.load_from_safetensors(safetensors_path)
```
!!! note
If you need to convert more model weights or all of them, check out the `refiners.conversion` module.
!!! warning
Using `script/prepare_test_weights.py` requires a GPU with significant VRAM and a lot of disk space.
Converting all the weights requires a lot of disk space and CPU time, so be prepared.
Currently downloading all the original weights takes around ~100GB of disk space,
and converting them all takes around ~70GB of disk space.
Now to check that it works copy your favorite 512x512 picture in the current directory as `input.png` and create `ldatest.py` with this content:
!!! warning
Some conversion scripts may also require quite a bit of RAM, since they load the entire weights in memory,
~16GB of RAM should be enough for most models, but some models may require more.
### Testing the conversion
To quickly check that the weights you got from the hub or converted yourself are correct, you can run the following snippet:
```py
from PIL import Image
from refiners.fluxion.utils import no_grad
from refiners.foundationals.latent_diffusion.stable_diffusion_1.model import SD1Autoencoder
image = Image.open("input.png")
with no_grad():
lda = SD1Autoencoder()
lda.load_from_safetensors("lda.safetensors")
latents = model.image_to_latents(image)
decoded = model.latents_to_image(latents)
image = Image.open("input.png")
latents = lda.image_to_latents(image)
decoded = lda.latents_to_image(latents)
decoded.save("output.png")
decoded.save("output.png")
```
Run it:
```bash
python ldatest.py
```
Inspect `output.png`: it should be similar to `input.png` but have a few differences. Latent Autoencoders are good compressors!
Inspect `output.png`, if the converted weights are correct, it should be similar to `input.png` (but have a few differences).
## Using Refiners in your own project
@ -63,20 +108,28 @@ So far you used Refiners as a standalone package, but if you want to create your
```bash
rye init --py "3.11" myproject
cd myproject
rye add --git "git@github.com:finegrain-ai/refiners.git" --features training refiners
rye add refiners@git+https://github.com/finegrain-ai/refiners
rye sync
```
If you only intend to do inference and no training, you can drop `--features training`.
To convert weights, you can either use a copy of the `refiners` repository as described above or add the `conversion` feature as a development dependency:
If you intend to use Refiners for training, you can install the `training` feature:
```bash
rye add --dev --git "git@github.com:finegrain-ai/refiners.git" --features conversion refiners
rye add refiners[training]@git+https://github.com/finegrain-ai/refiners
```
Similarly, if you need to use the conversion tools we provide, you install the `conversion` feature:
```bash
rye add refiners[conversion]@git+https://github.com/finegrain-ai/refiners
```
!!! note
You will still need to download the conversion scripts independently if you go that route.
You can install multiple features at once by separating them with a comma:
```bash
rye add refiners[training,conversion]@git+https://github.com/finegrain-ai/refiners
```
## What's next?

View file

@ -4,87 +4,96 @@ icon: material/castle
# Adapting Stable Diffusion XL
Stable Diffusion XL (SDXL) is a very popular text-to-image open source foundation model. This guide will show you how to boost its capabilities with Refiners, using iconic adapters the framework supports out-of-the-box, i.e. without the need for tedious prompt engineering. We'll follow a step by step approach, progressively increasing the number of adapters involved to showcase how simple adapter composition is using Refiners. Our use case will be the generation of an image with "a futuristic castle surrounded by a forest, mountains in the background".
Stable Diffusion XL (SDXL) is a very popular text-to-image open source foundation model.
This guide will show you how to boost its capabilities with Refiners, using iconic adapters the framework supports out-of-the-box (i.e. without the need for tedious prompt engineering).
We'll follow a step by step approach, progressively increasing the number of adapters involved to showcase how simple adapter composition is using Refiners.
Our use case will be the generation of an image with "a futuristic castle surrounded by a forest, mountains in the background".
## Prerequisites
## Baseline
Make sure Refiners is installed in your local environment - see [Getting started](/getting-started/recommended/) - and you have access to a decent GPU.
!!! warning
As the examples in this guide's code snippets use CUDA, a minimum of 24GB VRAM is needed.
Make sure that Refiners is installed in your local environment (see [Getting started](/getting-started/recommended/)),
and that you have access to a decent GPU (~24 GB VRAM should be enough).
Before diving into the adapters themselves, let's establish a baseline by simply prompting SDXL with Refiners.
!!! note "Reminder"
A StableDiffusion model is composed of three modules:
- An Autoencoder, responsible for embedding images into a latent space;
- A UNet, responsible for the diffusion process;
- A prompt encoder, such as CLIP, responsible for encoding the user prompt which will guide the diffusion process.
A StableDiffusion model is composed of three modules:
As Refiners comes with a new model representation - see [Chain](/concepts/chain/) - , you need to download and convert the weights of each module by calling our conversion scripts directly from your terminal (make sure you're in your local `refiners` directory, with your local environment active):
- An Autoencoder, responsible for embedding images into a latent space
- A UNet, responsible for the diffusion process
- A Text Encoder, responsible for encoding the user prompt which will guide the diffusion process.
```bash
python scripts/conversion/convert_transformers_clip_text_model.py --from "stabilityai/stable-diffusion-xl-base-1.0" --subfolder2 text_encoder_2 --to DoubleCLIPTextEncoder.safetensors --half
python scripts/conversion/convert_diffusers_unet.py --from "stabilityai/stable-diffusion-xl-base-1.0" --to sdxl-unet.safetensors --half
python scripts/conversion/convert_diffusers_autoencoder_kl.py --from "madebyollin/sdxl-vae-fp16-fix" --subfolder "" --to sdxl-lda.safetensors --half
```
!!! note
This will download the original weights from https://huggingface.co/ which takes some time. If you already have this repo cloned locally, use the `--from /path/to/stabilityai/stable-diffusion-xl-base-1.0` option instead.
Now, we can write the Python script responsible for inference. Just create a simple `inference.py` file, and open it in your favorite editor.
Start by instantiating a [`StableDiffusion_XL`][refiners.foundationals.latent_diffusion.stable_diffusion_xl.StableDiffusion_XL] model and load it with the converted weights:
Start by instantiating a [`StableDiffusion_XL`][refiners.foundationals.latent_diffusion.stable_diffusion_xl.StableDiffusion_XL] model and load the weights.
```py
import torch
from huggingface_hub import hf_hub_download
from refiners.fluxion.utils import manual_seed, no_grad
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16) # Using half-precision for memory efficiency
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
```
Then, define the inference parameters by setting the appropriate prompt / seed / inference steps:
Then, define the inference parameters by setting the appropriate prompt, seed and number of inference steps:
```py
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# hyperparameters
seed = 42
sdxl.set_inference_steps(50, first_step=0)
# Enable self-attention guidance to enhance the quality of the generated images
sdxl.set_self_attention_guidance(enable=True, scale=0.75)
# ... Inference process
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
```
You can now define and run the proper inference process:
Finally, define and run the inference process:
```py
with no_grad(): # Disable gradient calculation for memory-efficient inference
from refiners.fluxion.utils import manual_seed, no_grad
from tqdm import tqdm
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
# seed the random number generator, for reproducibility
manual_seed(seed)
# SDXL typically generates 1024x1024, here we use a higher resolution.
x = sdxl.init_latents((2048, 2048)).to(sdxl.device, sdxl.dtype)
# SDXL typically generates 1024x1024, here we use a higher resolution
x = sdxl.init_latents((2048, 2048))
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -95,49 +104,70 @@ with no_grad(): # Disable gradient calculation for memory-efficient inference
predicted_image = sdxl.lda.latents_to_image(x)
predicted_image.save("vanilla_sdxl.png")
```
??? example "Expand to see the entire end-to-end code"
```py
import torch
from huggingface_hub import hf_hub_download
from tqdm import tqdm
from refiners.fluxion.utils import manual_seed, no_grad
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16)
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
# hyperparameters
seed = 42
sdxl.set_inference_steps(50, first_step=0)
sdxl.set_self_attention_guidance(
enable=True, scale=0.75
) # Enable self-attention guidance to enhance the quality of the generated images
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
with no_grad(): # Disable gradient calculation for memory-efficient inference
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
manual_seed(seed=seed)
# seed the random number generator, for reproducibility
manual_seed(seed)
# SDXL typically generates 1024x1024, here we use a higher resolution.
x = sdxl.init_latents((2048, 2048)).to(sdxl.device, sdxl.dtype)
# SDXL typically generates 1024x1024, here we use a higher resolution
x = sdxl.init_latents((2048, 2048))
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -148,22 +178,32 @@ predicted_image.save("vanilla_sdxl.png")
predicted_image = sdxl.lda.latents_to_image(x)
predicted_image.save("vanilla_sdxl.png")
```
It's time to execute your code. The resulting image should look like this:
The resulting image should look like this:
<figure markdown>
<img src="vanilla_sdxl.webp" alt="Generated image of a castle using default SDXL weights" width="400">
<figcaption>Generated image of a castle using default SDXL weights.</figcaption>
</figure>
It is not really what we prompted the model for, unfortunately. To get a more futuristic-looking castle, you can either go for tedious prompt engineering, or use a pretrainered LoRA tailored to our use case, like the [Sci-fi Environments](https://civitai.com/models/105945?modelVersionId=140624) LoRA available on Civitai. We'll now show you how the LoRA option works with Refiners.
It is not really what we prompted the model for, unfortunately.
To get a more futuristic-looking castle, you can either go for tedious prompt engineering, or use a pretrainered LoRA tailored to our use case,
like the [Sci-fi Environments](https://civitai.com/models/105945?modelVersionId=140624) LoRA available on Civitai.
We'll now show you how the LoRA option works with Refiners.
## Single LoRA
To use the [Sci-fi Environments](https://civitai.com/models/105945?modelVersionId=140624) LoRA, all you have to do is download its weights to disk as a `.safetensors`, and inject them into SDXL using [`SDLoraManager`][refiners.foundationals.latent_diffusion.lora.SDLoraManager] right after instantiating `StableDiffusion_XL`:
Let's use the [Sci-fi Environments](https://civitai.com/models/105945?modelVersionId=140624) LoRA.
LoRas don't need to be converted, all you have to do is download the safetensors file from the internet.
You can easily download the LoRA by doing:
```bash
curl -L -o scifi.safetensors 'https://civitai.com/api/download/models/140624?type=Model&format=SafeTensor'
```
Inject the LoRA into SDXL using [`SDLoraManager`][refiners.foundationals.latent_diffusion.lora.SDLoraManager] right after instantiating `StableDiffusion_XL`:
```py
from refiners.fluxion.utils import load_from_safetensors
@ -171,55 +211,78 @@ from refiners.foundationals.latent_diffusion.lora import SDLoraManager
# Load LoRA weights from disk and inject them into target
manager = SDLoraManager(sdxl)
scifi_lora_weights = load_from_safetensors("Sci-fi_Environments_sdxl.safetensors")
manager.add_loras("scifi-lora", tensors=scifi_lora_weights)
scifi_lora_weights = load_from_safetensors("scifi.safetensors")
manager.add_loras("scifi", tensors=scifi_lora_weights)
```
??? example "Expand to see the entire end-to-end code"
```py
import torch
from huggingface_hub import hf_hub_download
from tqdm import tqdm
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.latent_diffusion.lora import SDLoraManager
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16)
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Load LoRA weights from disk and inject them into target
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
# add Sci-Fi LoRA
manager = SDLoraManager(sdxl)
scifi_lora_weights = load_from_safetensors("Sci-fi_Environments_sdxl.safetensors")
manager.add_loras("scifi-lora", tensors=scifi_lora_weights)
scifi_lora_weights = load_from_safetensors("scifi.safetensors")
manager.add_loras("scifi", tensors=scifi_lora_weights)
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# hyperparameters
seed = 42
sdxl.set_inference_steps(50, first_step=0)
sdxl.set_self_attention_guidance(
enable=True, scale=0.75
) # Enable self-attention guidance to enhance the quality of the generated images
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
with no_grad():
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
manual_seed(seed=seed)
# seed the random number generator, for reproducibility
manual_seed(seed)
# SDXL typically generates 1024x1024, here we use a higher resolution.
x = sdxl.init_latents((2048, 2048)).to(sdxl.device, sdxl.dtype)
# SDXL typically generates 1024x1024, here we use a higher resolution
x = sdxl.init_latents((2048, 2048))
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -227,13 +290,14 @@ manager.add_loras("scifi-lora", tensors=scifi_lora_weights)
pooled_text_embedding=pooled_text_embedding,
time_ids=time_ids,
)
# decode the latents to an image
predicted_image = sdxl.lda.decode_latents(x)
predicted_image.save("scifi_sdxl.png")
```
You should get something like this - pretty neat, isn't it?
You should get something like this - pretty neat, isn't it?
<figure markdown>
<img src="scifi_sdxl.webp" alt="Sci-fi castle" width="400">
@ -242,75 +306,103 @@ You should get something like this - pretty neat, isn't it?
## Multiple LoRAs
Continuing with our futuristic castle example, we might want to turn it, for instance, into a pixel art.
Continuing with our futuristic castle example, we might want to turn it, for instance, into a pixel art.
Again, we could either try some tedious prompt engineering,
or instead use another LoRA found on the web, such as [Pixel Art LoRA](https://civitai.com/models/120096/pixel-art-xl?modelVersionId=135931), found on Civitai.
This is dead simple as [`SDLoraManager`][refiners.foundationals.latent_diffusion.lora.SDLoraManager] allows loading multiple LoRAs:
Again, we could either try some tedious prompt engineering,
or instead use another LoRA found on the web, such as [Pixel Art LoRA](https://civitai.com/models/120096/pixel-art-xl?modelVersionId=135931), found on Civitai.
You can easily download the LoRA by doing:
```bash
curl -L -o pixelart.safetensors 'https://civitai.com/api/download/models/135931?type=Model&format=SafeTensor'
```
Injecting a second LoRA into the current SDXL model is dead simple, as [`SDLoraManager`][refiners.foundationals.latent_diffusion.lora.SDLoraManager] allows loading multiple LoRAs:
```py
# Load LoRAs weights from disk and inject them into target
# load LoRAs weights from disk and inject them into target
manager = SDLoraManager(sdxl)
manager.add_loras("scifi-lora", load_from_safetensors("Sci-fi_Environments_sdxl.safetensors"))
manager.add_loras("pixel-art-lora", load_from_safetensors("pixel-art-xl-v1.1.safetensors"))
manager.add_loras("scifi-lora", load_from_safetensors("scifi.safetensors"))
manager.add_loras("pixel-art-lora", load_from_safetensors("pixelart.safetensors"))
```
Adapters such as LoRAs also have a [scale][refiners.fluxion.adapters.Lora.scale] (roughly) quantifying the effect of this Adapter.
Refiners allows setting different scales for each Adapter, allowing the user to balance the effect of each Adapter:
```py
# Load LoRAs weights from disk and inject them into target
# load LoRAs weights from disk and inject them into target
manager = SDLoraManager(sdxl)
manager.add_loras("scifi-lora", load_from_safetensors("Sci-fi_Environments_sdxl.safetensors"), scale=1.0)
manager.add_loras("pixel-art-lora", load_from_safetensors("pixel-art-xl-v1.1.safetensors"), scale=1.4)
manager.add_loras("scifi-lora", load_from_safetensors("scifi.safetensors"), scale=1.0)
manager.add_loras("pixel-art-lora", load_from_safetensors("pixelart.safetensors"), scale=1.4)
```
??? example "Expand to see the entire end-to-end code"
```py
```py
import torch
from huggingface_hub import hf_hub_download
from tqdm import tqdm
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.latent_diffusion.lora import SDLoraManager
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16)
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Load LoRAs weights from disk and inject them into target
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
# add Sci-Fi and Pixel-Art LoRAs
manager = SDLoraManager(sdxl)
scifi_lora_weights = load_from_safetensors("Sci-fi_Environments_sdxl.safetensors")
pixel_art_lora_weights = load_from_safetensors("pixel-art-xl-v1.1.safetensors")
manager.add_loras("scifi-lora", scifi_lora_weights, scale=1.0)
manager.add_loras("pixel-art-lora", pixel_art_lora_weights, scale=1.4)
manager.add_loras("scifi-lora", load_from_safetensors("scifi.safetensors"), scale=1.0)
manager.add_loras("pixel-art-lora", load_from_safetensors("pixelart.safetensors"), scale=1.4)
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# hyperparameters
seed = 42
sdxl.set_inference_steps(50, first_step=0)
sdxl.set_self_attention_guidance(
enable=True, scale=0.75
) # Enable self-attention guidance to enhance the quality of the generated images
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
with no_grad():
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
manual_seed(seed=seed)
# seed the random number generator, for reproducibility
manual_seed(seed)
# SDXL typically generates 1024x1024, here we use a higher resolution.
x = sdxl.init_latents((2048, 2048)).to(sdxl.device, sdxl.dtype)
# SDXL typically generates 1024x1024, here we use a higher resolution
x = sdxl.init_latents((2048, 2048))
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -321,7 +413,6 @@ manager.add_loras("pixel-art-lora", load_from_safetensors("pixel-art-xl-v1.1.saf
predicted_image = sdxl.lda.latents_to_image(x)
predicted_image.save("scifi_pixel_sdxl.png")
```
The results are looking great:
@ -337,37 +428,43 @@ Refiners really shines when it comes to composing different Adapters to fully ex
For instance, IP-Adapter (covered in [a previous blog post](https://blog.finegrain.ai/posts/supercharge-stable-diffusion-ip-adapter/)) is a common choice for practictioners wanting to guide the diffusion process towards a specific prompt image.
In our example, consider this image of the [Neuschwanstein Castle](https://en.wikipedia.org/wiki/Neuschwanstein_Castle):
In our example, we would like to guide the diffusion process to align with this image of the [Neuschwanstein Castle](https://en.wikipedia.org/wiki/Neuschwanstein_Castle):
<figure markdown>
<img src="german-castle.jpg" alt="Castle Image" width="400">
<figcaption>Credits: Bayerische Schlösserverwaltung, Anton Brandl</figcaption>
</figure>
We would like to guide the diffusion process to align with this image, using IP-Adapter. First, download the image as well as the weights of IP-Adapter by calling the following commands from your terminal (again, make sure in you're in your local `refiners` directory):
You can easily download the above image by doing:
```bash
curl -O https://refine.rs/guides/adapting_sdxl/german-castle.jpg
python scripts/conversion/convert_transformers_clip_image_model.py --from "stabilityai/stable-diffusion-2-1-unclip" --to CLIPImageEncoderH.safetensors --half
curl -LO https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.bin
python scripts/conversion/convert_diffusers_ip_adapter.py --from ip-adapter-plus_sdxl_vit-h.bin --half
```
This will download and convert both IP-Adapter and CLIP Image Encoder pretrained weights.
Then, in your Python code, simply instantiate a [`SDXLIPAdapter`][refiners.foundationals.latent_diffusion.stable_diffusion_xl.image_prompt.SDXLIPAdapter] targeting our `sdxl.unet`, and inject it using a simple `.inject()` call:
Instantiate a [`SDXLIPAdapter`][refiners.foundationals.latent_diffusion.stable_diffusion_xl.image_prompt.SDXLIPAdapter] targeting our `sdxl.unet`, and inject it using a simple `.inject()` call:
```py
# IP-Adapter
ip_adapter = SDXLIPAdapter(
target=sdxl.unet,
weights=load_from_safetensors("ip-adapter-plus_sdxl_vit-h.safetensors"),
scale=1.0,
fine_grained=True # Use fine-grained IP-Adapter (i.e IP-Adapter Plus)
)
ip_adapter.clip_image_encoder.load_from_safetensors("CLIPImageEncoderH.safetensors")
ip_adapter.inject()
from refiners.foundationals.latent_diffusion.stable_diffusion_xl.image_prompt import SDXLIPAdapter
# load IP-Adapter
ip_adapter = SDXLIPAdapter(
target=sdxl.unet,
weights=load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.ip_adapter.plus",
filename="model.safetensors",
),
),
scale=1.0,
fine_grained=True, # Use fine-grained IP-Adapter (i.e IP-Adapter Plus)
)
ip_adapter.clip_image_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sd21.unclip.image_encoder",
filename="model.safetensors",
)
)
ip_adapter.inject()
```
Then, at runtime, we simply compute the embedding of the image prompt through the `ip_adapter` object, and set its embedding calling `.set_clip_image_embedding()`:
@ -384,68 +481,108 @@ with torch.no_grad():
```
!!! note
Be wary that composing Adapters (especially ones of different natures, such as LoRAs and IP-Adapter) can be tricky, as their respective effects can be adversarial. This is visible in our example below. In the code below, we tuned the LoRAs scales respectively to `1.5` and `1.55`. We invite you to try and test different seeds and scales to find the perfect combination!
Be wary that composing Adapters (especially ones of different natures, such as LoRAs and IP-Adapter) can be tricky, as their respective effects can be adversarial.
This is visible in our example below. In the code below, we tuned the LoRAs scales respectively to `1.5` and `1.55`.
We invite you to try and test different seeds and scales to find the perfect combination!
Furthermore, the order in which you inject adapters can also have an impact on the final result.
??? example "Expand to see the entire end-to-end code"
```py
import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from tqdm import tqdm
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.latent_diffusion.lora import SDLoraManager
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
from refiners.foundationals.latent_diffusion.stable_diffusion_xl.image_prompt import SDXLIPAdapter
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16)
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Load LoRAs weights from disk and inject them into target
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
# hyperparameters
seed = 42
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
# add Sci-Fi and Pixel-Art LoRAs
manager = SDLoraManager(sdxl)
scifi_lora_weights = load_from_safetensors("Sci-fi_Environments_sdxl.safetensors")
pixel_art_lora_weights = load_from_safetensors("pixel-art-xl-v1.1.safetensors")
manager.add_loras("scifi-lora", scifi_lora_weights, scale=1.5)
manager.add_loras("pixel-art-lora", pixel_art_lora_weights, scale=1.55)
manager.add_loras("scifi-lora", load_from_safetensors("scifi.safetensors"), scale=1.5)
manager.add_loras("pixel-art-lora", load_from_safetensors("pixelart.safetensors"), scale=1.55)
# Load IP-Adapter
# Instantiate the IP-Adapter
ip_adapter = SDXLIPAdapter(
target=sdxl.unet,
weights=load_from_safetensors("ip-adapter-plus_sdxl_vit-h.safetensors"),
weights=load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.ip_adapter.plus",
filename="model.safetensors",
),
),
scale=1.0,
fine_grained=True, # Use fine-grained IP-Adapter (IP-Adapter Plus)
fine_grained=True, # Use fine-grained IP-Adapter (i.e IP-Adapter Plus)
)
ip_adapter.clip_image_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sd21.unclip.image_encoder",
filename="model.safetensors",
)
)
ip_adapter.clip_image_encoder.load_from_safetensors("CLIPImageEncoderH.safetensors")
ip_adapter.inject()
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# load image prompt
image_prompt = Image.open("german-castle.jpg")
seed = 42
sdxl.set_inference_steps(50, first_step=0)
sdxl.set_self_attention_guidance(
enable=True, scale=0.75
) # Enable self-attention guidance to enhance the quality of the generated images
with no_grad():
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
# compute image prompt embeddings
clip_image_embedding = ip_adapter.compute_clip_image_embedding(ip_adapter.preprocess_image(image_prompt))
ip_adapter.set_clip_image_embedding(clip_image_embedding)
manual_seed(seed=seed)
x = sdxl.init_latents((1024, 1024)).to(sdxl.device, sdxl.dtype)
# seed the random number generator, for reproducibility
manual_seed(seed)
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# SDXL typically generates 1024x1024
x = sdxl.init_latents((1024, 1024))
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -456,7 +593,6 @@ with torch.no_grad():
predicted_image = sdxl.lda.latents_to_image(x)
predicted_image.save("scifi_pixel_IP_sdxl.png")
```
The result looks convincing: we do get a *pixel-art, futuristic-looking Neuschwanstein castle*!
@ -467,9 +603,9 @@ The result looks convincing: we do get a *pixel-art, futuristic-looking Neuschwa
</figure>
## Everything else + T2I-Adapter
## Multiple LoRAs + IP-Adapter + T2I-Adapter
T2I-Adapters[^1] are a powerful class of Adapters aiming at controlling the Text-to-Image (T2I) diffusion process with external control signals, such as canny edges or pose estimations inputs.
T2I-Adapters are a powerful class of Adapters aiming at controlling the Text-to-Image (T2I) diffusion process with external control signals, such as canny edges or pose estimations inputs.
In this section, we will compose our previous example with the [Depth-Zoe Adapter](https://huggingface.co/TencentARC/t2i-adapter-depth-zoe-sdxl-1.0), providing a depth condition to the diffusion process using the following depth map as input signal:
<figure markdown>
@ -477,21 +613,27 @@ In this section, we will compose our previous example with the [Depth-Zoe Adapte
<figcaption>Input depth map of the initial castle image.</figcaption>
</figure>
First, download the image as well as the weights of T2I-Depth-Zoe-Adapter by calling the following commands:
You can easily download the above image by doing:
```bash
curl -O https://refine.rs/guides/adapting_sdxl/zoe-depth-map-german-castle.png
python scripts/conversion/convert_diffusers_t2i_adapter.py --from "TencentARC/t2i-adapter-depth-zoe-sdxl-1.0" --to t2i_depth_zoe_xl.safetensors --half
```
Then, just inject it as usual:
```py
from refiners.foundationals.latent_diffusion.stable_diffusion_xl.t2i_adapter import SDXLT2IAdapter
# Load T2I-Adapter
t2i_adapter = SDXLT2IAdapter(
target=sdxl.unet,
name="zoe-depth",
weights=load_from_safetensors("t2i_depth_zoe_xl.safetensors"),
target=sdxl.unet,
name="zoe-depth",
weights=load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.t2i_adapter.depth.zoe",
filename="model.safetensors",
),
),
scale=0.72,
).inject()
```
@ -515,75 +657,120 @@ with torch.no_grad():
```py
import torch
from huggingface_hub import hf_hub_download
from PIL import Image
from tqdm import tqdm
from refiners.fluxion.utils import load_from_safetensors, manual_seed, no_grad, image_to_tensor
from refiners.fluxion.utils import image_to_tensor, interpolate, load_from_safetensors, manual_seed, no_grad
from refiners.foundationals.latent_diffusion.lora import SDLoraManager
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL, SDXLT2IAdapter
from refiners.foundationals.latent_diffusion.stable_diffusion_xl import StableDiffusion_XL
from refiners.foundationals.latent_diffusion.stable_diffusion_xl.image_prompt import SDXLIPAdapter
from refiners.foundationals.latent_diffusion.stable_diffusion_xl.t2i_adapter import SDXLT2IAdapter
# Load SDXL
sdxl = StableDiffusion_XL(device="cuda", dtype=torch.float16)
sdxl.clip_text_encoder.load_from_safetensors("DoubleCLIPTextEncoder.safetensors")
sdxl.unet.load_from_safetensors("sdxl-unet.safetensors")
sdxl.lda.load_from_safetensors("sdxl-lda.safetensors")
# instantiate SDXL model
sdxl = StableDiffusion_XL(
device="cuda", # use GPU
dtype=torch.float16 # use half-precision for memory efficiency
)
# Load LoRAs weights from disk and inject them into target
# Load the weights
sdxl.clip_text_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.text_encoder",
filename="model.safetensors",
)
)
sdxl.unet.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.unet",
filename="model.safetensors",
)
)
sdxl.lda.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.autoencoder_fp16fix",
filename="model.safetensors",
)
)
# hyperparameters
seed = 42
num_inference_steps = 50
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
sdxl.set_inference_steps(num_inference_steps, first_step=0)
# enable self-attention guidance to enhance the quality of the generated images
sag_scale = 0.75
sdxl.set_self_attention_guidance(enable=True, scale=sag_scale)
# add Sci-Fi and Pixel-Art LoRAs
manager = SDLoraManager(sdxl)
scifi_lora_weights = load_from_safetensors("Sci-fi_Environments_sdxl.safetensors")
pixel_art_lora_weights = load_from_safetensors("pixel-art-xl-v1.1.safetensors")
manager.add_loras("scifi-lora", scifi_lora_weights, scale=1.5)
manager.add_loras("pixel-art-lora", pixel_art_lora_weights, scale=1.55)
manager.add_loras("scifi-lora", load_from_safetensors("scifi.safetensors"), scale=1.5)
manager.add_loras("pixel-art-lora", load_from_safetensors("pixelart.safetensors"), scale=1.55)
# Load IP-Adapter
# Instantiate the IP-Adapter
ip_adapter = SDXLIPAdapter(
target=sdxl.unet,
weights=load_from_safetensors("ip-adapter-plus_sdxl_vit-h.safetensors"),
weights=load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.ip_adapter.plus",
filename="model.safetensors",
),
),
scale=1.0,
fine_grained=True, # Use fine-grained IP-Adapter (IP-Adapter Plus)
fine_grained=True, # Use fine-grained IP-Adapter (i.e IP-Adapter Plus)
)
ip_adapter.clip_image_encoder.load_from_safetensors(
hf_hub_download(
repo_id="refiners/sd21.unclip.image_encoder",
filename="model.safetensors",
)
)
ip_adapter.clip_image_encoder.load_from_safetensors("CLIPImageEncoderH.safetensors")
ip_adapter.inject()
# Load T2I-Adapter
t2i_adapter = SDXLT2IAdapter(
target=sdxl.unet,
name="zoe-depth",
weights=load_from_safetensors("t2i_depth_zoe_xl.safetensors"),
target=sdxl.unet,
name="zoe-depth",
weights=load_from_safetensors(
hf_hub_download(
repo_id="refiners/sdxl.t2i_adapter.depth.zoe",
filename="model.safetensors",
),
),
scale=0.72,
).inject()
# Hyperparameters
prompt = "a futuristic castle surrounded by a forest, mountains in the background"
# load image prompt and image depth condition
image_prompt = Image.open("german-castle.jpg")
image_depth_condition = Image.open("zoe-depth-map-german-castle.png")
seed = 42
sdxl.set_inference_steps(50, first_step=0)
sdxl.set_self_attention_guidance(
enable=True, scale=0.75
) # Enable self-attention guidance to enhance the quality of the generated images
with no_grad():
with no_grad(): # disable gradient calculation for memory-efficient inference
# encode the text prompts to embeddings, and get the time_ids
clip_text_embedding, pooled_text_embedding = sdxl.compute_clip_text_embedding(
text=prompt + ", best quality, high quality",
negative_text="monochrome, lowres, bad anatomy, worst quality, low quality",
)
time_ids = sdxl.default_time_ids
# compute and set image prompt embeddings
clip_image_embedding = ip_adapter.compute_clip_image_embedding(ip_adapter.preprocess_image(image_prompt))
ip_adapter.set_clip_image_embedding(clip_image_embedding)
# Spatial dimensions should be divisible by default downscale factor (=16 for T2IAdapter ConditionEncoder)
condition = image_to_tensor(image_depth_condition.convert("RGB").resize((1024, 1024)), device=sdxl.device, dtype=sdxl.dtype)
t2i_adapter.set_condition_features(features=t2i_adapter.compute_condition_features(condition))
# compute and set the T2I features
condition = image_to_tensor(image_depth_condition.convert("RGB"), device=sdxl.device, dtype=sdxl.dtype)
condition = interpolate(condition, torch.Size((1024, 1024)))
t2i_features = t2i_adapter.compute_condition_features(condition)
t2i_adapter.set_condition_features(features=t2i_features)
manual_seed(seed=seed)
x = sdxl.init_latents((1024, 1024)).to(sdxl.device, sdxl.dtype)
# seed the random number generator, for reproducibility
manual_seed(seed)
# Diffusion process
for step in sdxl.steps:
if step % 10 == 0:
print(f"Step {step}")
# SDXL typically generates 1024x1024
x = sdxl.init_latents((1024, 1024))
# diffusion denoising process
for step in tqdm(sdxl.steps):
x = sdxl(
x,
step=step,
@ -594,7 +781,6 @@ with torch.no_grad():
predicted_image = sdxl.lda.latents_to_image(x)
predicted_image.save("scifi_pixel_IP_T2I_sdxl.png")
```
The results look convincing: the depth and proportions of the initial castle are more faithful, while preserving our *futuristic, pixel-art style*!
@ -606,5 +792,3 @@ The results look convincing: the depth and proportions of the initial castle are
## Wrap up
As you can see in this guide, composing Adapters on top of foundation models is pretty seamless in Refiners, allowing practitioners to quickly test out different combinations of Adapters for their needs. We encourage you to try out different ones, and even train some yourselves!
[^1]: Mou, C., Wang, X., Xie, L., Zhang, J., Qi, Z., Shan, Y., & Qie, X. (2023). T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models.

View file

@ -14,7 +14,7 @@ icon: material/water-outline
[![packaging - Hatch](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/refiners)](https://pypi.org/project/refiners/)
[![PyPI - Status](https://badge.fury.io/py/refiners.svg)](https://badge.fury.io/py/refiners)
[![license](https://img.shields.io/badge/license-MIT-blue)](/LICENSE) <br>
[![license](https://img.shields.io/badge/license-MIT-blue)](https://github.com/finegrain-ai/refiners/blob/main/LICENSE) <br>
[![code bounties](https://img.shields.io/badge/code-bounties-blue)](https://finegrain.ai/bounties)
[![Discord](https://img.shields.io/discord/1179456777406922913?logo=discord&logoColor=white&color=%235765F2)](https://discord.gg/mCmjNUVV7d)
[![HuggingFace - Refiners](https://img.shields.io/badge/refiners-ffd21e?logo=huggingface&labelColor=555)](https://huggingface.co/refiners)

View file

@ -2,6 +2,6 @@
{% block announce %}
Check out our brand new <a href="https://finegrain.ai/bounties">Bounty Program</a> 💰!
Check out our <a href="https://finegrain.ai/bounties">Bounty Program</a> 💰!
{% endblock %}
{% endblock %}

View file

@ -1,8 +1,7 @@
* <code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Fluxion
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Adapters](fluxion/adapters.md)
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Context](fluxion/context.md)
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Layers](fluxion/layers.md)
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Model Converter](fluxion/model_converter.md)
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Context](fluxion/context.md)
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Utils](fluxion/utils.md)
* <code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> Foundation Models
* [<code class="doc-symbol doc-symbol-nav doc-symbol-module"></code> CLIP](foundationals/clip.md)

View file

@ -33,7 +33,7 @@ plugins:
python:
import:
- https://docs.python.org/3/objects.inv
- https://pytorch.org/docs/master/objects.inv
- https://pytorch.org/docs/main/objects.inv
- https://docs.kidger.site/jaxtyping/objects.inv
options:
show_bases: true
@ -55,7 +55,7 @@ watch:
extra_css:
- stylesheets/extra.css
nav:
- Home:
- Home:
- Welcome: index.md
- Manifesto: home/why.md
- Getting started:
@ -75,9 +75,9 @@ extra:
- icon: fontawesome/brands/discord
link: https://discord.gg/mCmjNUVV7d
- icon: fontawesome/brands/github
link: https://github.com/finegrain-ai/refiners
link: https://github.com/finegrain-ai/refiners
- icon: fontawesome/brands/twitter
link: https://twitter.com/finegrain_ai
link: https://twitter.com/finegrain_ai
- icon: fontawesome/brands/linkedin
link: https://www.linkedin.com/company/finegrain-ai/
markdown_extensions:

View file

@ -464,25 +464,25 @@ class IPAdapter(Generic[T], fl.Chain, Adapter[T]):
Args:
image_prompt: A single image or a list of images to compute embeddings for.
This can be a PIL Image, a list of PIL Images, or a Tensor.
This can be a PIL Image, a list of PIL Images, or a Tensor.
weights: An optional list of scaling factors for the conditional embeddings.
If provided, it must have the same length as the number of images in `image_prompt`.
Each weight scales the corresponding image's conditional embedding, allowing you to
adjust the influence of each image. Defaults to uniform weights of 1.0.
If provided, it must have the same length as the number of images in `image_prompt`.
Each weight scales the corresponding image's conditional embedding, allowing you to
adjust the influence of each image. Defaults to uniform weights of 1.0.
concat_batches: Determines how embeddings are concatenated when multiple images are provided:
- If `True`, embeddings from multiple images are concatenated along the feature
dimension to form a longer sequence of image tokens. This is useful when you want to
treat multiple images as a single combined input.
- If `False`, embeddings are kept separate along the batch dimension, treating each image
independently.
- If `True`, embeddings from multiple images are concatenated along the feature
dimension to form a longer sequence of image tokens. This is useful when you want to
treat multiple images as a single combined input.
- If `False`, embeddings are kept separate along the batch dimension, treating each image
independently.
Returns:
A Tensor containing the CLIP image embeddings.
The structure of the returned Tensor depends on the `concat_batches` parameter:
- If `concat_batches` is `True` and multiple images are provided, the embeddings are
concatenated along the feature dimension.
- If `concat_batches` is `False` or a single image is provided, the embeddings are returned
as a batch, with one embedding per image.
- If `concat_batches` is `True` and multiple images are provided, the embeddings are
concatenated along the feature dimension.
- If `concat_batches` is `False` or a single image is provided, the embeddings are returned
as a batch, with one embedding per image.
"""
if isinstance(image_prompt, Image.Image):
image_prompt = self.preprocess_image(image_prompt)