OFT
Orthogonal Finetuning (OFT) is a method developed for adapting text-to-image diffusion models. It works by reparameterizing the pretrained weight matrices with it’s orthogonal matrix to preserve information in the pretrained model. To reduce the number of parameters, OFT introduces a block-diagonal structure in the orthogonal matrix.
The abstract from the paper is:
Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method — Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
OFTConfig
class peft.OFTConfig
< source >( peft_type: Union = None auto_mapping: Optional = None base_model_name_or_path: Optional = None revision: Optional = None task_type: Union = None inference_mode: bool = False rank_pattern: Optional[dict] = <factory> alpha_pattern: Optional[dict] = <factory> r: int = 8 module_dropout: float = 0.0 target_modules: Union = None init_weights: bool = True layers_to_transform: Union = None layers_pattern: Optional = None modules_to_save: Optional = None coft: bool = False eps: float = 6e-05 block_share: bool = False )
Parameters
- r (
int
) — OFT rank. - module_dropout (
int
) — The dropout probability for disabling OFT modules during training. - target_modules (
Optional[Union[List[str], str]]
) — The names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as ‘all-linear’, then all linear modules are chosen, excluding the output layer. If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually. - init_weights (
bool
) — Whether to perform initialization of OFT weights. - layers_to_transform (
Union[List[int], int]
) — The layer indices to transform. If a list of ints is passed, it will apply the adapter to the layer indices that are specified in this list. If a single integer is passed, it will apply the transformations on the layer at this index. - layers_pattern (
str
) — The layer pattern name, used only iflayers_to_transform
is different fromNone
. - rank_pattern (
dict
) — The mapping from layer names or regexp expression to ranks which are different from the default rank specified byr
. - modules_to_save (
List[str]
) — List of modules apart from adapter layers to be set as trainable and saved in the final checkpoint. - coft (
bool
) — Whether to use the constrained variant of OFT or not, off by default. - eps (
float
) — The control strength of COFT. The freedom of rotation. Only has an effect ifcoft
is set to True. - block_share (
bool
) — Whether to share the OFT parameters between blocks or not. This isFalse
by default.
This is the configuration class to store the configuration of a OFTModel.
OFTModel
class peft.OFTModel
< source >( model config adapter_name low_cpu_mem_usage: bool = False ) → torch.nn.Module
Parameters
- model (
torch.nn.Module
) — The model to which the adapter tuner layers will be attached. - config (OFTConfig) — The configuration of the OFT model.
- adapter_name (
str
) — The name of the adapter, defaults to"default"
. - low_cpu_mem_usage (
bool
,optional
, defaults toFalse
) — Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns
torch.nn.Module
The OFT model.
Creates Orthogonal Finetuning model from a pretrained model. The method is described in https://arxiv.org/abs/2306.07280
Example:
>>> from diffusers import StableDiffusionPipeline
>>> from peft import OFTModel, OFTConfig
>>> config_te = OFTConfig(
... r=8,
... target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
... module_dropout=0.0,
... init_weights=True,
... )
>>> config_unet = OFTConfig(
... r=8,
... target_modules=[
... "proj_in",
... "proj_out",
... "to_k",
... "to_q",
... "to_v",
... "to_out.0",
... "ff.net.0.proj",
... "ff.net.2",
... ],
... module_dropout=0.0,
... init_weights=True,
... )
>>> model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> model.text_encoder = OFTModel(model.text_encoder, config_te, "default")
>>> model.unet = OFTModel(model.unet, config_unet, "default")
Attributes:
- model (
~torch.nn.Module
) — The model to be adapted. - peft_config (OFTConfig): The configuration of the OFT model.