metadata

license: apache-2.0
tags:
  - generated_from_trainer
  - siglip
metrics:
  - accuracy
  - f1
base_model: google/siglip-base-patch16-512
model-index:
  - name: siglip-tagger-test-2
    results: []
pipeline_tag: image-classification

siglip-tagger-test-2

This model is a fine-tuned version of google/siglip-base-patch16-512 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 364.7850
Accuracy: 0.2539
F1: 0.9967

Model description

This model is an experimental model that predicts danbooru tags of images.

Example

from PIL import Image

import torch
from transformers import (
    AutoModelForImageClassification,
    AutoImageProcessor,
)
import numpy as np

MODEL_NAME = "p1atdev/siglip-tagger-test-2"

model = AutoModelForImageClassification.from_pretrained(
    MODEL_NAME, torch_dtype=torch.bfloat16, trust_remote_code=True
)
model.eval()
processor = AutoImageProcessor.from_pretrained(MODEL_NAME)

image = Image.open("sample.jpg") # load your image
inputs = processor(image, return_tensors="pt").to(model.device, model.dtype)

logits = model(**inputs).logits.detach().cpu().float()[0]
logits = np.clip(logits, 0.0, 1.0)

results = {
    model.config.id2label[i]: logit for i, logit in enumerate(logits) if logit > 0
}
results = sorted(results.items(), key=lambda x: x[1], reverse=True)

for tag, score in results:
    print(f"{tag}: {score*100:.2f}%")
# 1girl: 100.00%
# outdoors: 100.00%
# sky: 100.00%
# solo: 100.00%
# school uniform: 96.88%
# skirt: 92.97%
# day: 89.06%
# ...

Intended uses & limitations

This model is for research use only and is not recommended for production.

Please use wd-v1-4-tagger series by SmilingWolf:

etc.

Training and evaluation data

High quality 5000 images from danbooru. They were shulled and split into train:eval at 4500:500.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
1496.9876	1.0	141	691.3267	0.1242	0.9957
860.0218	2.0	282	433.5286	0.1626	0.9965
775.4277	3.0	423	409.0374	0.1827	0.9966
697.2465	4.0	564	396.5604	0.2025	0.9966
582.6023	5.0	705	388.3294	0.2065	0.9966
617.5087	6.0	846	382.2605	0.2213	0.9966
627.533	7.0	987	377.6726	0.2269	0.9967
595.4033	8.0	1128	374.3268	0.2327	0.9967
593.3854	9.0	1269	371.4181	0.2409	0.9967
537.9777	10.0	1410	369.5010	0.2421	0.9967
552.3083	11.0	1551	368.0743	0.2468	0.9967
570.5438	12.0	1692	366.8302	0.2498	0.9967
507.5343	13.0	1833	366.1787	0.2499	0.9967
515.5528	14.0	1974	365.5653	0.2525	0.9967
458.5096	15.0	2115	365.1838	0.2528	0.9967
515.6953	16.0	2256	364.9844	0.2535	0.9967
533.7929	17.0	2397	364.8577	0.2538	0.9967
520.3728	18.0	2538	364.8066	0.2537	0.9967
525.1097	19.0	2679	364.7850	0.2539	0.9967
482.0612	20.0	2820	364.7876	0.2539	0.9967

Framework versions

Transformers 4.37.2
Pytorch 2.1.2+cu118
Datasets 2.16.1
Tokenizers 0.15.0