Model Card for Fine-Tuned gemma-2-2b-it
on Custom Korean Sentiment Dataset
Model Summary
This model is a fine-tuned version of google/gemma-2-2b-it
, trained to classify sentiment in Korean text into four categories: 무감정 (neutral), 슬픔 (sadness), 기쁨 (joy), and 분노 (anger). The model utilizes LoRA (Low-Rank Adaptation) for efficient fine-tuning and 4-bit quantization (NF4) for memory efficiency using BitsAndBytes. A custom weighted loss function was applied to handle class imbalance within the dataset.
The model is suitable for multi-class sentiment classification in Korean and is optimized for environments with limited computational resources due to the quantization.
Model Details
Developed By:
This model was fine-tuned by [Your Name or Organization] using Hugging Face's peft
and transformers
libraries with a custom Korean sentiment dataset.
Model Type:
This is a transformer-based model for multi-class sentiment classification in the Korean language.
Language:
- Language(s): Korean
License:
[Add relevant license here]
Finetuned From:
- Base Model:
google/gemma-2-2b-it
Framework Versions:
- Transformers: 4.44.2
- PEFT: 0.12.0
- Datasets: 3.0.1
- PyTorch: 2.4.1+cu121
Intended Uses & Limitations
Intended Use:
This model is suitable for applications requiring multi-class sentiment classification in Korean, such as chatbots, social media monitoring, or customer feedback analysis.
Out-of-Scope Use:
The model may not perform optimally for tasks requiring multi-language support, sentiment classification with additional classes, or outside the specific context of Korean language data.
Limitations:
- Bias: As the model is trained on a custom dataset, it may reflect specific biases inherent in that data.
- Generalization: Performance may vary when applied to datasets outside the scope of the initial training data, such as other forms of sentiment classification.
Model Architecture
Quantization:
The model uses 4-bit quantization via BitsAndBytes for efficient memory usage, which enables it to run on lower-resource hardware.
LoRA Configuration:
LoRA (Low-Rank Adaptation) was applied to specific transformer layers, allowing for parameter-efficient fine-tuning. The target modules include:
down_proj
,gate_proj
,q_proj
,o_proj
,up_proj
,v_proj
,k_proj
LoRA parameters are:
r = 16
,lora_alpha = 32
,lora_dropout = 0.05
Custom Weighted Loss:
A custom weighted loss function was implemented to handle class imbalance, using the following weights:
[ \text{weights} = [0.2032, 0.2704, 0.2529, 0.2735] ]
These weights correspond to the classes: 무감정, 슬픔, 기쁨, 분노, respectively.
Training Details
Dataset:
The model was trained on a custom Korean sentiment analysis dataset. This dataset consists of text samples labeled with one of four sentiment classes: 무감정, 슬픔, 기쁨, and 분노.
- Train Set Size: Custom dataset
- Test Set Size: Custom dataset
- Classes: 4 (무감정, 슬픔, 기쁨, 분노)
Preprocessing:
Data was tokenized using the google/gemma-2-2b-it
tokenizer with a maximum sequence length of 128. The preprocessing steps included padding and truncation to ensure consistent input lengths.
Hyperparameters:
- Learning Rate: 2e-4
- Batch Size (train): 8
- Batch Size (eval): 8
- Epochs: 4
- Optimizer: AdamW (with 8-bit optimization)
- Weight Decay: 0.01
- Gradient Accumulation Steps: 2
- Evaluation Steps: 500
- Logging Steps: 500
- Metric for Best Model: F1 (weighted)
Evaluation
Metrics:
The model was evaluated using the following metrics:
- Accuracy
- F1 Score (weighted)
- Precision (weighted)
- Recall (weighted)
The evaluation provides a detailed view of the model's performance across multiple metrics, which helps in understanding its strengths and areas for improvement.
Code Example:
You can load the fine-tuned model and use it for inference on your own data as follows:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("your-model-directory")
tokenizer = AutoTokenizer.from_pretrained("your-model-directory")
# Tokenize input text
text = "이 영화는 정말 슬퍼요."
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
# Get predictions
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()
# Map prediction to label
id2label = {0: "무감정", 1: "슬픔", 2: "기쁨", 3: "분노"}
print(f"Predicted sentiment: {id2label[predicted_class]}")
- Downloads last month
- 1