metadata

license: other
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - argilla/dpo-mix-7k
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
base_model: Columbia-NLP/gemma-2b-zephyr-sft
model-index:
  - name: gemma-2b-zephyr-dpo
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 52.22
            name: normalized accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 73.11
            name: normalized accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 42.55
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 42.64
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 19.94
            name: accuracy

Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
License: Gemma Terms of Use
Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models	Avg.	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8k
google/gemma-2b	46.37	48.38	71.77	41.77	33.08	66.77	16.91
google/gemma-2b-it	42.75	43.94	62.70	37.65	45.82	60.93	5.46
wandb/gemma-2b-zephyr-sft	47.18	49.74	72.38	41.37	34.42	66.93	18.27
wandb/gemma-2b-zephyr-dpo	46.92	49.66	72.23	41.13	34.47	66.54	17.51
Columbia-NLP/gemma-2b-zephyr-sft	48.75	51.80	72.63	42.20	41.96	63.85	20.09
Columbia-NLP/gemma-2b-zephyr-dpo	49.14	52.22	73.11	42.55	42.64	64.40	19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model	Total	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	STEM	Writing
google/gemma-2b-it	4.71	2.95	4.35	6.15	2.90	3.50	5.60	5.50	6.70
wandb/gemma-2b-zephyr-sft	4.03	3.10	3.15	5.00	2.70	2.65	5.10	4.80	5.75
wandb/gemma-2b-zephyr-dpo	4.06	2.80	2.90	5.55	2.65	2.70	5.20	4.80	5.85
anakin87_gemma-2b-orpo	4.14	3.00	3.70	6.30	2.70	2.35	5.68	4.75	4.75
Columbia-NLP/gemma-2b-zephyr-sft	4.34	3.10	3.70	6.25	2.65	2.70	5.55	5.25	5.50
Columbia-NLP/gemma-2b-zephyr-dpo	4.75	3.50	4.05	6.75	3.30	3.70	5.85	5.40	5.53