---
license: mit
language:
- en
library_name: diffusers
tags:
- lora
- image-generation
- diffusion
- face-generation
- text-conditioned-human-portrait
- synthetic-captions
- diffusers
---
# Text2Face-LoRa
![Python version](https://img.shields.io/badge/python-3.8+-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green)

This is a LoRa-finetuned version of the Stable Diffusion 2.1 model specifically optimized 
for generating face images. The model was trained with [FFHQ](https://github.com/NVlabs/ffhq-dataset) and [easyportrait](https://github.com/hukenovs/easyportrait) 
using synthetic text captions for both datasets. 
Details on the dataset format and preparation will be available soon. 


## Checkpoints
You can download the pretrained LoRa weights for the diffusion model and text encoder using 

```python
from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="michaeltrs/text2face",
                    filename="checkpoints/lora30k/pytorch_lora_weights.safetensors",
                    local_dir="checkpoints")
```

## Inference
Generate images using the `generate.py` script, which loads the SD2.1 foundation model from Hugging Face and applies the LoRa weights. 
Generation is driven by defining a prompt and optionally a negative prompt.
```python
from diffusers import StableDiffusionPipeline
import torch


class Model:
    def __init__(self, checkpoint="checkpoints/lora30k", weight_name="pytorch_lora_weights.safetensors", device="cuda"):
        self.checkpoint = checkpoint
        state_dict, network_alphas = StableDiffusionPipeline.lora_state_dict(
            # Path to my trained lora output_dir
            checkpoint,
            weight_name=weight_name
        )
        self.pipe = StableDiffusionPipeline.from_pretrained(
            "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16).to(device)
        self.pipe.load_lora_into_unet(state_dict, network_alphas, self.pipe.unet, adapter_name='test_lora')
        self.pipe.load_lora_into_text_encoder(state_dict, network_alphas, self.pipe.text_encoder, adapter_name='test_lora')
        self.pipe.set_adapters(["test_lora"], adapter_weights=[1.0])


    def generate(self, prompt, negprompt='', steps=50, savedir=None, seed=1):
        lora_scale = 1.0
        image = self.pipe(prompt,
                     negative_prompt=negprompt,
                     num_inference_steps=steps,
                     cross_attention_kwargs={"scale": lora_scale},
                     generator=torch.manual_seed(seed)).images[0]
        if savedir is None:
            image.save(f"{self.checkpoint}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png")
        else:
            image.save(f"{savedir}/{'_'.join(prompt.replace('.', ' ').split(' '))}.png")
        return image


if __name__ == "__main__":

    model = Model()

    prompt = 'A happy 55 year old male with blond hair and a goatee smiles with visible teeth.'
    negprompt = ''

    image = model.generate(prompt, negprompt=negprompt, steps=50, seed=42)
```

## Limitations

This model, Text2Face-LoRa, is finetuned from Stable Diffusion 2.1 and as such, inherits all the limitations and biases 
associated with the base model. These biases may manifest in skewed representations across different ethnicities and 
genders due to the nature of the training data originally used for Stable Diffusion 2.1.

### Specific Limitations Include:

- **Ethnic and Gender Biases**: The model may generate images that do not equally represent the diversity of human 
features in different ethnic and gender groups, potentially reinforcing or exacerbating existing stereotypes.

- **Selection Bias in Finetuning Datasets**: The datasets used for finetuning this model were selected with specific 
criteria in mind, which may not encompass a wide enough variety of data points to correct for the inherited biases of the base model.

- **Caption Generation Bias**: The synthetic annotations used to finetune this model were generated by automated 
face analysis models, which themselves may be biased. This could lead to inaccuracies in facial feature interpretation 
and representation, particularly for less-represented demographics in the training data.

### Ethical Considerations:

Users are encouraged to consider these limitations when deploying the model in real-world applications, especially 
those involving diverse human subjects. It is advisable to perform additional validations and seek ways to mitigate 
these biases in practical use cases.