hermitage-xl / README.md

Update README.md

ef84292 over 1 year ago

9.45 kB

	---
	license: openrail++
	language:
	- en
	thumbnail: "https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/thumbnail.png"
	pipeline_tag: text-to-image
	tags:
	- stable-diffusion
	- stable-diffusion-diffusers
	inference: true
	widget:
	- text: >-
	masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn,
	cumulonimbus clouds, lighting, blue sky, falling leaves, garden
	example_title: example 1girl
	- text: >-
	masterpiece, best quality, 1boy, medium hair, blonde hair, blue eyes,
	bishounen, colorful, autumn, cumulonimbus clouds, lighting, blue sky,
	falling leaves, garden
	example_title: example 1boy
	library_name: diffusers
	---

	<style>
	.title-container {
	display: flex;
	justify-content: center;
	align-items: center;
	height: 100vh; /* Adjust this value to position the title vertically */
	}
	.title {
	font-size: 3em;
	text-align: center;
	color: #333;
	font-family: Arial, sans-serif;
	text-transform: uppercase;
	letter-spacing: 0.05em;
	padding: 0.5em 0;
	box-shadow: 0px 0px 20px 0px rgba(0,0,0,0.15);
	background: transparent;
	}
	.title span {
	background: -webkit-linear-gradient(45deg, #fe6b8b 30%, #ff8e53 90%);
	-webkit-background-clip: text;
	-webkit-text-fill-color: transparent;
	}
	.image-grid {
	display: grid;
	grid-template-columns: repeat(3, 1fr);
	gap: 0.5em;
	}
	.image-item {
	box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15);
	padding: 10px;
	}
	.image-item img {
	width: 100%;
	height: 100%;
	object-fit: cover;
	border-radius: 10px;
	transition: transform .2s;
	}
	.image-item img:hover {
	transform: scale(1.1);
	}
	.custom-table {
	table-layout: fixed;
	width: 100%;
	border-collapse: collapse;
	}
	.custom-table td {
	width: 50%;
	vertical-align: top;
	padding: 10px;
	box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15);
	}
	.custom-image {
	width: 100%;
	height: 100%;
	object-fit: cover;
	border-radius: 10px;
	transition: transform .2s;
	}
	.custom-image:hover {
	transform: scale(1.1);
	}
	</style>

	<h1 class="title"><span>Hermitage XL</span></h1>

	<div class="image-grid">
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample1.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample1.png">
	</a>
	</div>
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample2.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample2.png">
	</a>
	</div>
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample3.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample3.png">
	</a>
	</div>
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample4.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample4.png">
	</a>
	</div>
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample5.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample5.png">
	</a>
	</div>
	<div class="image-item">
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample6.png">
	<img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample6.png">
	</a>
	</div>
	</div>

	<hr>

	## Overview

	Hermitage XL is a high-resolution, latent text-to-image diffusion model. The model has been fine-tuned using a learning rate of 4e-7 over 5000 steps with a batch size of 16 on a curated dataset of superior-quality anime-style images. This model is derived from Stable Diffusion XL 1.0.

	e.g. _1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_

	- Use it with the [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
	- Use it with 🧨 [`diffusers`](https://huggingface.co/docs/diffusers/index)
	- Use it with the [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI)

	<hr>

	## Features

	1. High-Resolution Images: The model trained with 1024x1024 resolution. The model is trained using [NovelAI Aspect Ratio Bucketing Tool](https://github.com/NovelAI/novelai-aspect-ratio-bucketing) so that it can be trained at non-square resolutions.
	2. Anime-styled Generation: Based on given text prompts, the model can create high quality anime-styled images.
	3. Fine-Tuned Diffusion Process: The model utilizes a fine-tuned diffusion process to ensure high quality and unique image output.

	<hr>

	## Model Details

	- Developed by: [Linaqruf](https://github.com/Linaqruf)
	- Model type: Diffusion-based text-to-image generative model
	- Model Description: This is a model that can be used to generate and modify anime-themed images based on text prompts.
	- License: [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
	- Finetuned from model: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
	<hr>

	## How to Use:
	- Download `Hermitage XL` [here](https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/hermitage-xl.safetensors), the model is in `.safetensors` format.
	- You need to use Danbooru-style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime
	- You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse:
	```
	lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
	```
	- And, the following should also be prepended to prompts to get high aesthetic results:
	```
	masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details
	```
	<hr>

	## 🧨 Diffusers

	Make sure to upgrade diffusers to >= 0.18.2:
	```
	pip install diffusers --upgrade
	```

	In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
	```
	pip install invisible_watermark transformers accelerate safetensors
	```

	Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:
	```py
	import torch
	from torch import autocast
	from diffusers.models import AutoencoderKL
	from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

	model = "Linaqruf/hermitage-xl"
	vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")

	pipe = StableDiffusionXLPipeline.from_pretrained(
	model,
	torch_dtype=torch.float16,
	use_safetensors=True,
	variant="fp16",
	vae=vae
	)

	pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
	pipe.to('cuda')

	prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck"
	negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

	image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	width=1024,
	height=1024,
	guidance_scale=12,
	target_size=(1024,1024),
	original_size=(4096,4096),
	num_inference_steps=50
	).images[0]

	image.save("anime_girl.png")
	```
	<hr>

	## Limitation
	1. This model inherit Stable Diffusion XL 1.0 [limitation](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#limitations)
	2. This model is overfitted and cannot follow prompts well, because it's fine-tuned for 5000 steps with small scale datasets.
	3. It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0

	<hr>

	## Example

	Here is some cherrypicked samples and comparison between available models:

	<table class="custom-table">
	<tr>
	<td>
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image1.png">
	<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image1.png" alt="sample1">
	</a>
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image3.png">
	<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image3.png" alt="sample3">
	</a>
	</td>
	<td>
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image2.png">
	<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image2.png" alt="sample2">
	</a>
	<a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image4.png">
	<img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image4.png" alt="sample4">
	</a>
	</td>
	</tr>
	</table>