|
--- |
|
license: openrail++ |
|
language: |
|
- en |
|
thumbnail: "https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/thumbnail.png" |
|
pipeline_tag: text-to-image |
|
tags: |
|
- stable-diffusion |
|
- stable-diffusion-diffusers |
|
inference: true |
|
widget: |
|
- text: >- |
|
masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, |
|
cumulonimbus clouds, lighting, blue sky, falling leaves, garden |
|
example_title: example 1girl |
|
- text: >- |
|
masterpiece, best quality, 1boy, medium hair, blonde hair, blue eyes, |
|
bishounen, colorful, autumn, cumulonimbus clouds, lighting, blue sky, |
|
falling leaves, garden |
|
example_title: example 1boy |
|
library_name: diffusers |
|
--- |
|
|
|
<style> |
|
.title-container { |
|
display: flex; |
|
justify-content: center; |
|
align-items: center; |
|
height: 100vh; /* Adjust this value to position the title vertically */ |
|
} |
|
.title { |
|
font-size: 3em; |
|
text-align: center; |
|
color: #333; |
|
font-family: Arial, sans-serif; |
|
text-transform: uppercase; |
|
letter-spacing: 0.05em; |
|
padding: 0.5em 0; |
|
box-shadow: 0px 0px 20px 0px rgba(0,0,0,0.15); |
|
background: transparent; |
|
} |
|
.title span { |
|
background: -webkit-linear-gradient(45deg, #fe6b8b 30%, #ff8e53 90%); |
|
-webkit-background-clip: text; |
|
-webkit-text-fill-color: transparent; |
|
} |
|
.image-grid { |
|
display: grid; |
|
grid-template-columns: repeat(3, 1fr); |
|
gap: 0.5em; |
|
} |
|
.image-item { |
|
box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15); |
|
padding: 10px; |
|
} |
|
.image-item img { |
|
width: 100%; |
|
height: 100%; |
|
object-fit: cover; |
|
border-radius: 10px; |
|
transition: transform .2s; |
|
} |
|
.image-item img:hover { |
|
transform: scale(1.1); |
|
} |
|
.custom-table { |
|
table-layout: fixed; |
|
width: 100%; |
|
border-collapse: collapse; |
|
} |
|
.custom-table td { |
|
width: 50%; |
|
vertical-align: top; |
|
padding: 10px; |
|
box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15); |
|
} |
|
.custom-image { |
|
width: 100%; |
|
height: 100%; |
|
object-fit: cover; |
|
border-radius: 10px; |
|
transition: transform .2s; |
|
} |
|
.custom-image:hover { |
|
transform: scale(1.1); |
|
} |
|
</style> |
|
|
|
<h1 class="title"><span>Hermitage XL</span></h1> |
|
|
|
<div class="image-grid"> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample1.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample1.png"> |
|
</a> |
|
</div> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample2.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample2.png"> |
|
</a> |
|
</div> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample3.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample3.png"> |
|
</a> |
|
</div> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample4.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample4.png"> |
|
</a> |
|
</div> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample5.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample5.png"> |
|
</a> |
|
</div> |
|
<div class="image-item"> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample6.png"> |
|
<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample6.png"> |
|
</a> |
|
</div> |
|
</div> |
|
|
|
<hr> |
|
|
|
## Overview |
|
|
|
Hermitage XL is a high-resolution, latent text-to-image diffusion model. The model has been fine-tuned using a learning rate of 4e-7 over 5000 steps with a batch size of 16 on a curated dataset of superior-quality anime-style images. This model is derived from Stable Diffusion XL 1.0. |
|
|
|
e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_** |
|
|
|
- Use it with the [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) |
|
- Use it with 🧨 [`diffusers`](https://huggingface.co/docs/diffusers/index) |
|
- Use it with the [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI) |
|
|
|
<hr> |
|
|
|
## Features |
|
|
|
1. High-Resolution Images: The model trained with 1024x1024 resolution. The model is trained using [NovelAI Aspect Ratio Bucketing Tool](https://github.com/NovelAI/novelai-aspect-ratio-bucketing) so that it can be trained at non-square resolutions. |
|
2. Anime-styled Generation: Based on given text prompts, the model can create high quality anime-styled images. |
|
3. Fine-Tuned Diffusion Process: The model utilizes a fine-tuned diffusion process to ensure high quality and unique image output. |
|
|
|
<hr> |
|
|
|
## Model Details |
|
|
|
- **Developed by:** [Linaqruf](https://github.com/Linaqruf) |
|
- **Model type:** Diffusion-based text-to-image generative model |
|
- **Model Description:** This is a model that can be used to generate and modify anime-themed images based on text prompts. |
|
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL) |
|
- **Finetuned from model:** [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) |
|
<hr> |
|
|
|
## How to Use: |
|
- Download `Hermitage XL` [here](https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/hermitage-xl.safetensors), the model is in `.safetensors` format. |
|
- You need to use Danbooru-style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime |
|
- You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse: |
|
``` |
|
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry |
|
``` |
|
- And, the following should also be prepended to prompts to get high aesthetic results: |
|
``` |
|
masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details |
|
``` |
|
<hr> |
|
|
|
## 🧨 Diffusers |
|
|
|
Make sure to upgrade diffusers to >= 0.18.2: |
|
``` |
|
pip install diffusers --upgrade |
|
``` |
|
|
|
In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark: |
|
``` |
|
pip install invisible_watermark transformers accelerate safetensors |
|
``` |
|
|
|
Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**: |
|
```py |
|
import torch |
|
from torch import autocast |
|
from diffusers.models import AutoencoderKL |
|
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler |
|
|
|
model = "Linaqruf/hermitage-xl" |
|
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae") |
|
|
|
pipe = StableDiffusionXLPipeline.from_pretrained( |
|
model, |
|
torch_dtype=torch.float16, |
|
use_safetensors=True, |
|
variant="fp16", |
|
vae=vae |
|
) |
|
|
|
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) |
|
pipe.to('cuda') |
|
|
|
prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck" |
|
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry" |
|
|
|
image = pipe( |
|
prompt, |
|
negative_prompt=negative_prompt, |
|
width=1024, |
|
height=1024, |
|
guidance_scale=12, |
|
target_size=(1024,1024), |
|
original_size=(4096,4096), |
|
num_inference_steps=50 |
|
).images[0] |
|
|
|
image.save("anime_girl.png") |
|
``` |
|
<hr> |
|
|
|
## Limitation |
|
1. This model inherit Stable Diffusion XL 1.0 [limitation](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#limitations) |
|
2. This model is overfitted and cannot follow prompts well, because it's fine-tuned for 5000 steps with small scale datasets. |
|
3. It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0 |
|
|
|
<hr> |
|
|
|
## Example |
|
|
|
Here is some cherrypicked samples and comparison between available models: |
|
|
|
<table class="custom-table"> |
|
<tr> |
|
<td> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image1.png"> |
|
<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image1.png" alt="sample1"> |
|
</a> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image3.png"> |
|
<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image3.png" alt="sample3"> |
|
</a> |
|
</td> |
|
<td> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image2.png"> |
|
<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image2.png" alt="sample2"> |
|
</a> |
|
<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image4.png"> |
|
<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image4.png" alt="sample4"> |
|
</a> |
|
</td> |
|
</tr> |
|
</table> |
|
|