File size: 7,392 Bytes
3e163ff
1bc103d
3e163ff
 
 
4b7c79c
3e163ff
 
 
 
 
 
 
 
 
 
 
 
 
7687e25
3e163ff
 
 
 
 
 
1bc103d
 
 
 
3e163ff
 
1bc103d
3e163ff
1bc103d
 
 
 
 
 
 
3e163ff
1bc103d
 
 
3e163ff
1bc103d
3e163ff
1bc103d
3e163ff
 
 
1bc103d
 
 
 
3e163ff
1bc103d
 
3e163ff
 
1bc103d
 
 
 
 
 
 
 
 
 
 
 
 
3e163ff
1bc103d
 
3e163ff
1bc103d
 
3e163ff
1bc103d
3e163ff
1bc103d
 
 
 
3e163ff
 
 
 
1bc103d
3e163ff
 
 
1bc103d
 
 
3e163ff
 
1bc103d
3e163ff
1bc103d
3e163ff
1bc103d
 
 
 
 
3e163ff
1bc103d
3e163ff
 
 
 
 
1bc103d
 
3e163ff
1bc103d
3e163ff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# Tagging methodology for Kazusa (blue archive)

## README / Intro
Since I've seen a few people share this already I'll provide this disclaimer.

This is not really intended to be a guide, it's just a log/checklist of my process, for my own benefit, since I repeat this for a lot of LoRAs and I got tired of winging it every single time.  I've put only the slightest amount of effort into making it accessible to others.

I don't claim that any or all of these optimal, nor can I confidently put them forth as recommendations. They're literally just a record of the steps I follow while tagging, gradually developed after ~16 characters using some version of the below process.

Still, I can at least point to my pre-Koharu LoRAs (which used pure WD1.4 tags) and the ones that came after (where I started heavily editing tags) and see a steady progression in quality and prompting flexibility despite using mostly the same training settings for each one.

Yes, it takes forever to do all of this shit. No, I don't recommend it unless you're extremely autistic; raw WD1.4 tags are probably good enough for most people. If you intend to do this for more than a few characters, I strongly recommend learning [Hydrus](https://hydrusnetwork.github.io/hydrus/introduction.html) it makes all of this way, way less tedious compared to doing it with crappier tools.

---

## Prep

- Scraped `1girl kazusa_(blue_archive) order:popularity` from sancom, curated for quality, then exported from Hydrus to feed into WD1.4 Tagger.
  - Kazusa has a shitload of good art so I had to be very picky to get down to 280 images, which is still a lot. In hindsight I think huge datasets aren't really a problem; they let you train for longer without overfitting.
  - Gelbooru is probably fine too. Danbooru sucks for ロリ unless you have Gold.
  - I also got a few newer images from pixiv, don't remember which ones.
- Exported final images from Hydrus to feed into WD1.4 Tagger
- Auto-tagged with WD1.4 Swinv2 at 0.25 confidence
- Reimported images+tags into Hydrus using the .txt sidecar feature. I strongly recommend putting WD1.4 tags in a separate tag domain so they aren't mixed in with shit scraped from boorus.

## Tagging

- Tag unique features
  - `halo` / `demon horns` / `low wings`
  - Remove when not present or out of view.  WD1.4 likes putting `halo` even on images where no halo is visible.
  - **Kazusa**: `halo` / `animal ears`
    - Pruned `extra ears` as it seems redundant and intrinsic to the character.
- Tag outfit variants with a single master tag
  - **Kazusa**:
    - Uniform: `school uniform` / `black jacket`
      - Sometimes the jacket appears without anything else, which was not tagged `school uniform`
    - Non-canon costumes
      - Add `alternate costume`
  - Nudity (WD1.4 usually does this accurately)
    - `nude` / `completely nude`
- Prune eye colors
  - Keep tags which describe unusual eye features (`multicolored eyes`, `heterochromia`, `slit pupils`) as they can otherwise be too subtle and inconsistently drawn for the AI to notice
- Prune hair colors
  - This includes `two-toned hair`, `gradiant hair`, etc.  The AI learns all of these very consistently without the tags, likely because artists tend to draw them consistently
- Partially prune hair styles
  - Leave key, defining style tags like `twintails`, `ponytail`, `short hair with long locks`, `twin braids`, etc.
  - Prune exceedingly common tags like `bangs` / `sidelocks` / `eyebrows visible through hair` / `hair between eyes`, etc.
    - Somewhat arbitrary, but I just don't think there's much value in them because they're ubiquitous and caption space is limited
  - Prune length, except for images which differ from the character's usual length
    - If you don't do this, it's more likely to get the hair length wrong when not prompted, which isn't a huge deal.
    - Add `alternate hairstyle` and/or `alternate hair length` on applicable images, which can be used to more easily change styles while prompting
  - **Kazusa**: `short hair, colored inner hair` -- while I would usually prune these, they're really her only defining hairstyle traits
- Fixup hair ornaments
  - Prune generic `hair ornament` in favor of more specificity
    - `hairclip` / `black headband` / `hair flower` / `hair ribbon`, etc.
  - Consolidate tags that have color variants (`headband` >> `black headband`)
  - **Kazusa**: `hairclip`
- Consolidate outfits
  - Only tag an item when it is actually visible. If it is only barely visible along the edge of an image, keep in mind it may be cropped during bucketing.
  - Danbooru's wiki entry for a character often provides a good list of tags for a character's entire outfit.
  - **Kazusa outfits**:
    - School Uniform
      - `black choker`
      - `hooded jacket`
      - `black jacket`
      - `green sailor collar`
      - `pink neckerchief`
      - `miniskirt`
      - `pleated skirt`
      - `white skirt`
      - `black pantyhose`
      - `sneakers`
- Fixup sleeves
  - ie. `long sleeves` / `puffy long sleeves` / `detached sleeves`
  - You only need one, but pick one and be consistent. If sleeves aren't tagged the AI tends to add them inappropriately (such as when prompting for sleeveless outfits or nudity)
- Fixup collars
  - ie. `detached collar` / `collared shirt` / `choker` / etc.
  - Same deal as sleeves, they tend to appear when unwanted if not consistently tagged according to actual visibility
- Fixup clothing state
  - ie. `open jacket` / `open shirt` / `partially undressed` / `off shoulder`
  - The tagger is generally good at this but it can help to double-check for weird outfits
- Tag expressions
  - This is tedious and the autotagger doesn't help you out much, but tagging these can really help the AI nail multiple iconic expressions for a character
  - Start by searching for images without one of these, and add them.
    - `open mouth`
    - `closed mouth`
    - `parted lips`
      - Sometimes applies with `open mouth`
  - Then proceed through each image and add one of these
  - `smile` / `light smile` / `:d` / `grin` (exposed teeth only)
  - `:o` / `:<` / `expressionless` / `serious`
  - `wavy mouth` / `embarrassed`
  - `pout` / `:t` / `tsundere`
  - `nervous` / `nervous smile`
  - `flustered` / `swirly eyes` / `@_@`
  - `surprised` / `o_o` / `wide-eyed`
  - `upset` / `annoyed` / `frustrated` / `v-shaped eyebrows`
  - `naughty face` / `seductive smile`
  - `smug` / `:3` / `smirk`
  - `yelling` / `frown`
  - `eyes closed` / `one eye closed`
    - WD1.4 almost always gets these two
- Tag camera angles/composition
  - Most of these aren't very high value, but `from x` can be helpful.
  - `cowboy shot`
  - `upper body`
  - `full body`
  - `portrait`
  - `feet out of frame`
  - `cropped torso` / `cropped legs`
  - `from side` / `from above` / `from below` / `from behind`
- Tag iconic poses, actions, or props
  - Props need to show up often in training data for this to be worth it.
  - `v` / `peace sign` / `standing on one leg` 
  - `holding dango` / `weapon case` / `fashion magazine`
  - **Kazusa**
    - `mouth hold`
    - `eating`
    - `macaron`
- Flip through each image and use Hydrus's "related tags" feature to quickly identify important tags that might be missing.
  - This feature looks at other images with similar tags to provide suggestions.  Good for spotting things you or the tagger might have missed.