Vision Transformer (ViT) for Music Genre Classification
Model Overview
Model Name: ghermoso/vit-eGTZANplus
Task: Image Classification
Dataset: egtzan_plus
Model Architecture: Vision Transformer (ViT)
Finetuned from model: This model is a fine-tuned version of google/vit-base-patch16-224-in21k on an egtzan_plus dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8358
- Accuracy: 0.7460
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ghermoso/vit-eGTZANplus
Base model
google/vit-base-patch16-224-in21k