elinas
/

Llama-3-15B-Instruct-zeroed-ft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

elinas commited on May 15

Commit

bdbfdc7

•

1 Parent(s): 666d08a

Update README.md

Files changed (1) hide show

README.md +17 -4

README.md CHANGED Viewed

@@ -1,22 +1,32 @@
 ---
 base_model:
-- elinas/Llama-3-15B-Instruct
 library_name: transformers
 tags:
 - mergekit
 - merge
 license: llama3
 ---
-# Llama-3-15B-Instruct-ft
-This is a QLoRA **finetune** of a merge of pre-trained language models created using...
-TODO
 ## Datasets
 * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
 ## Finetuning details
 This is a QLoRA model and all modules were targeted.
 ```yaml
@@ -25,6 +35,9 @@ lora_target_modules:
   - o_proj
 ```
 ```yaml
 The following hyperparameters were used during training:
 - learning_rate: 1e-05

 ---
 base_model:
+- elinas/Llama-3-15B-Instruct-zeroed
 library_name: transformers
 tags:
 - mergekit
 - merge
 license: llama3
 ---
+# Llama-3-15B-Instruct-ft-zeroed
+This is a QLoRA **finetune** of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
+The model is based on a "zeroed" passthrough merge of [Llama-3-13B-Instruct](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed)
+This was primarily an experiment to see how a passthrough merge will respond to further finetuning, though this was done on a small dataset.
+The goal was to make a "mid" sized model like Meta has released in the past and the merge method was inspired by [mlabonne's Llama-3-120B](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct).
+The model was finetuned on **8192 context length** and is likely reliable using RoPE up to 32k.
+Further finetuning this model or finetuning the [base model](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed) on more samples is encouraged.
 ## Datasets
 * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
+A small, high quality, dataset was used as a PoC / validation on stabilizing the model after finetuning.
 ## Finetuning details
 This is a QLoRA model and all modules were targeted.
 ```yaml
   - o_proj
 ```
+The model is coherent even with training the "zeroed" loayers and can write well. In the next experiment, all layers will be finetuned as this was
+the recommendation from [Charles Goddard] - thank you for the method of merging!
 ```yaml
 The following hyperparameters were used during training:
 - learning_rate: 1e-05