refiners/docs/concepts/adapter/index.md

110 lines
4.1 KiB
Markdown
Raw Permalink Normal View History

2024-02-01 17:32:05 +00:00
---
icon: material/tray-plus
---
2024-01-31 11:44:43 +00:00
# Adapter
2023-08-04 16:23:18 +00:00
2024-01-31 11:44:43 +00:00
Adapters are the final and most high-level abstraction in Refiners. They are the concept of adaptation turned into code.
An Adapter is [generally](#higher-level-adapters) a Chain that replaces a Module (the target) in another Chain (the parent). Typically the target will become a child of the adapter.
2023-08-04 16:23:18 +00:00
2024-02-02 10:48:46 +00:00
In code terms, [`Adapter`][refiners.fluxion.adapters.Adapter] is a generic mixin. Adapters subclass `type(parent)` and `Adapter[type(target)]`. For instance, if you adapt a `Conv2d` in a `Sum`, the definition of the Adapter could look like:
2023-08-04 16:23:18 +00:00
```py
2024-01-31 11:44:43 +00:00
class MyAdapter(fl.Sum, fl.Adapter[fl.Conv2d]):
2023-08-04 16:23:18 +00:00
...
```
## A simple example: adapting a Linear
Let us take a simple example to see how this works. Consider this model:
2024-01-31 11:44:43 +00:00
![before](linear-before.png)
2023-08-04 16:23:18 +00:00
2024-01-31 11:44:43 +00:00
In code, it could look like this:
2023-08-04 16:23:18 +00:00
```py
2024-01-31 11:44:43 +00:00
my_model = MyModel(fl.Chain(fl.Linear(), fl.Chain(...)))
2023-08-04 16:23:18 +00:00
```
Suppose we want to adapt the Linear to sum its output with the result of another chain. We can define and initialize an adapter like this:
```py
2024-01-31 11:44:43 +00:00
class MyAdapter(fl.Sum, fl.Adapter[fl.Linear]):
def __init__(self, target: fl.Linear) -> None:
2023-08-04 16:23:18 +00:00
with self.setup_adapter(target):
2024-01-31 11:44:43 +00:00
super().__init__(fl.Chain(...), target)
2023-08-04 16:23:18 +00:00
# Find the target and its parent in the chain.
# For simplicity let us assume it is the only Linear.
2024-01-31 11:44:43 +00:00
for target, parent in my_model.walk(fl.Linear):
2023-08-04 16:23:18 +00:00
break
adapter = MyAdapter(target)
```
The result is now this:
2024-01-31 11:44:43 +00:00
![ejected](linear-ejected.png)
2023-08-04 16:23:18 +00:00
Note that the original chain is unmodified. You can still run inference on it as if the adapter did not exist. To use the adapter, you must inject it into the chain:
```py
adapter.inject(parent)
```
The result will be:
2024-01-31 11:44:43 +00:00
![injected](linear-injected.png)
2023-08-04 16:23:18 +00:00
Now if you run inference it will go through the Adapter. You can go back to the previous situation by calling `adapter.eject()`.
## A more complicated example: adapting a Chain
We are not limited to adapting base modules, we can also adapt Chains.
Starting from the same model as earlier, let us assume we want to:
- invert the order of the Linear and Chain B in Chain A ;
- replace the first child block of chain B with the original Chain A.
2024-02-02 10:48:46 +00:00
This Adapter that will perform a [`structural_copy`][refiners.fluxion.layers.Chain.structural_copy] of part of its target, which means it will duplicate all Chain nodes but keep pointers to the same [`WeightedModule`][refiners.fluxion.layers.WeightedModule]s, and hence not use extra GPU memory.
2023-08-04 16:23:18 +00:00
```py
2024-01-31 11:44:43 +00:00
class MyAdapter(fl.Chain, fl.Adapter[fl.Chain]):
def __init__(self, target: fl.Linear) -> None:
2023-08-04 16:23:18 +00:00
with self.setup_adapter(target):
2024-01-31 11:44:43 +00:00
new_b = fl.Chain(target, target.Chain.Chain_2.structural_copy())
2023-08-04 16:23:18 +00:00
super().__init__(new_b, target.Linear)
adapter = MyAdapter(my_model.Chain_1) # Chain A in the diagram
```
We end up with this:
2024-01-31 11:44:43 +00:00
![chain-ejected](chain-ejected.png)
2023-08-04 16:23:18 +00:00
We can now inject it into the original graph. It is not even needed to pass the parent this time, since Chains know their parents.
```py
adapter.inject()
```
We obtain this:
2024-01-31 11:44:43 +00:00
![chain-injected](chain-injected.png)
2023-08-04 16:23:18 +00:00
Note that the Linear is in the Chain twice now, but that does not matter as long as you really want it to be the same Linear layer with the same weights.
As before, we can call eject the adapter to go back to the original model.
2024-02-02 10:48:46 +00:00
## A real-world example: [LoraAdapter][refiners.fluxion.adapters.LoraAdapter]
2023-08-04 16:23:18 +00:00
2024-01-31 11:44:43 +00:00
A popular example of adaptation is [LoRA](https://arxiv.org/abs/2106.09685). You can check out [how we implement it in Refiners](https://github.com/finegrain-ai/refiners/blob/main/src/refiners/fluxion/adapters/lora.py).
## Higher-level adapters
If you use Refiners, you will find Adapters that go beyond the simple definition given at the top of this page. Some adapters inject multiple smaller adapters in models, others implement helper methods to be used by their caller...
2024-03-28 13:22:30 +00:00
From a bird's eye view, you can just consider Adapters as things you inject into models to adapt them, and that can be ejected to return the model to its original state. You will get a better feel for what is an adapter and how to leverage them by actually using the framework.