v000000's picture
Adding Evaluation Results (#1)
ffc620e verified
metadata
license: apache-2.0
library_name: transformers
tags:
  - mergekit
  - merge
  - qwen2
  - qwen2.5
  - dpo
base_model:
  - v000000/Qwen2.5-14B-Gutenberg-1e-Delta
  - Qwen/Qwen2.5-14B-Instruct
datasets:
  - jondurbin/gutenberg-dpo-v0.1
model-index:
  - name: Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 48.55
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 49.74
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 19.71
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 15.21
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 18.43
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 48.68
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno
          name: Open LLM Leaderboard

Qwen2.5-14B-Gutenberg-Instruct-Slerpeno

image/png


GGUF from mradermacher!

GGUF from QuantFactory!

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method. (sophosympatheia gradient)

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: Qwen/Qwen2.5-14B-Instruct
merge_method: slerp
base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta
parameters:
  t:
    - value: [0, 0, 0.3, 0.4, 0.5, 0.6, 0.5, 0.4, 0.3, 0, 0]
dtype: bfloat16

The idea here is that Gutenberg DPO stays in the output/input 100% while merging smoothly with the base instruct model in the deeper layers to heal loss and increase intelligence.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 33.39
IFEval (0-Shot) 48.55
BBH (3-Shot) 49.74
MATH Lvl 5 (4-Shot) 19.71
GPQA (0-shot) 15.21
MuSR (0-shot) 18.43
MMLU-PRO (5-shot) 48.68