|
--- |
|
license: other |
|
tags: |
|
- alignment-handbook |
|
- trl |
|
- dpo |
|
- generated_from_trainer |
|
datasets: |
|
- argilla/dpo-mix-7k |
|
license_name: gemma-terms-of-use |
|
license_link: https://ai.google.dev/gemma/terms |
|
base_model: Columbia-NLP/gemma-2b-zephyr-sft |
|
model-index: |
|
- name: gemma-2b-zephyr-dpo |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 52.22 |
|
name: normalized accuracy |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 73.11 |
|
name: normalized accuracy |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 42.55 |
|
name: accuracy |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 42.64 |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 64.4 |
|
name: accuracy |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 19.94 |
|
name: accuracy |
|
--- |
|
|
|
# Model Card for Gemma 2B Zephyr DPO |
|
|
|
We trained the [google/gemma-2b](https://huggingface.co/google/gemma-2b) with DPO and data from `argilla/dpo-mix-7k`. |
|
We carefully selected the hyper-parameters to achieve the best DPO performance. |
|
|
|
## Model description |
|
|
|
- **Model type:** A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. |
|
- **Language(s) (NLP):** Primarily English |
|
- **License:** Gemma Terms of Use |
|
- **Finetuned from model:** [google/gemma-2b](https://huggingface.co/google/gemma-2b) |
|
|
|
|
|
## License |
|
This model has the same license as the [original Gemma model collection](https://ai.google.dev/gemma/terms) |
|
|
|
## OpenLLM Leaderboard Performance |
|
|
|
| Models | Avg. | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k | |
|
|-----------------------------------------|------|-------|-----------|------|------------|------------|-------| |
|
| google/gemma-2b | 46.37| 48.38 | 71.77 | 41.77| 33.08 | 66.77 | 16.91 | |
|
| google/gemma-2b-it | 42.75| 43.94 | 62.70 | 37.65| 45.82 | 60.93 | 5.46 | |
|
| wandb/gemma-2b-zephyr-sft | 47.18| 49.74 | 72.38 | 41.37| 34.42 | **66.93** | 18.27 | |
|
| wandb/gemma-2b-zephyr-dpo | 46.92| 49.66 | 72.23 | 41.13| 34.47 | 66.54 | 17.51 | |
|
| Columbia-NLP/gemma-2b-zephyr-sft | 48.75| 51.80 | 72.63 | 42.20| 41.96 | 63.85 | **20.09** | |
|
| **Columbia-NLP/gemma-2b-zephyr-dpo** | **49.14**| **52.22** | **73.11** | **42.55**| **42.64** | 64.40 | 19.94 | |
|
|
|
|
|
## MT-Bench |
|
|
|
We evaluate our model with `GPT-4-0125-preview` as the judge. |
|
|
|
| Model | Total | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing | |
|
|------------------------------------------|-------|--------|------------|------------|------|-----------|----------|------|---------| |
|
| google/gemma-2b-it | 4.71 | 2.95 | **4.35** | 6.15 | 2.90 | 3.50 | 5.60 | **5.50** | **6.70** | |
|
| wandb/gemma-2b-zephyr-sft | 4.03 | 3.10 | 3.15 | 5.00 | 2.70 | 2.65 | 5.10 | 4.80 | 5.75 | |
|
| wandb/gemma-2b-zephyr-dpo | 4.06 | 2.80 | 2.90 | 5.55 | 2.65 | 2.70 | 5.20 | 4.80 | 5.85 | |
|
| anakin87_gemma-2b-orpo | 4.14 | 3.00 | 3.70 | 6.30 | 2.70 | 2.35 | 5.68 | 4.75 | 4.75 | |
|
| Columbia-NLP/gemma-2b-zephyr-sft | 4.34 | 3.10 | 3.70 | 6.25 | 2.65 | 2.70 | 5.55 | 5.25 | 5.50 | |
|
| **Columbia-NLP/gemma-2b-zephyr-dpo** | **4.75** | **3.50** | 4.05 | **6.75** | **3.30** | **3.70** | **5.85** | 5.40 | 5.53 | |