Just an experiment to try and extend the context of SUS, a 4K Yi model, and DPO Bagel, which breaks down after 4K context. Yi 4K was used as a base (even for bagel which is technically a Yi 200K model), and Yi 200K is merged in with a density of 1.
I wanted to include Hermes 34B, but something funky about its tokenizer breaks mergekit.
A component of another merge. Auto generated mergekit description below:
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama as a base.
Models Merged
The following models were included in the merge:
- /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B
- /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
- /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2
- /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2
Configuration
The following YAML configuration was used to produce this model:
models:
- model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
# No parameters necessary for base model
- model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
parameters:
weight: 0.5
density: 1
- model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B
parameters:
weight: 0.2
density: 0.12
- model: /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2
parameters:
weight: 0.2
density: 0.15
- model: /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2
parameters:
weight: 0.1
density: 0.12
merge_method: dare_ties
tokenizer_source: union
base_model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
parameters:
int8_mask: true
dtype: bfloat16
- Downloads last month
- 76
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.