elinas commited on
Commit
bdbfdc7
1 Parent(s): 666d08a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -1,22 +1,32 @@
1
  ---
2
  base_model:
3
- - elinas/Llama-3-15B-Instruct
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
  license: llama3
9
  ---
10
- # Llama-3-15B-Instruct-ft
11
 
12
- This is a QLoRA **finetune** of a merge of pre-trained language models created using...
13
 
14
- TODO
 
 
 
 
 
 
 
 
15
 
16
  ## Datasets
17
 
18
  * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
19
 
 
 
20
  ## Finetuning details
21
  This is a QLoRA model and all modules were targeted.
22
  ```yaml
@@ -25,6 +35,9 @@ lora_target_modules:
25
  - o_proj
26
  ```
27
 
 
 
 
28
  ```yaml
29
  The following hyperparameters were used during training:
30
  - learning_rate: 1e-05
 
1
  ---
2
  base_model:
3
+ - elinas/Llama-3-15B-Instruct-zeroed
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
  license: llama3
9
  ---
10
+ # Llama-3-15B-Instruct-ft-zeroed
11
 
12
+ This is a QLoRA **finetune** of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
 
14
+ The model is based on a "zeroed" passthrough merge of [Llama-3-13B-Instruct](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed)
15
+
16
+ This was primarily an experiment to see how a passthrough merge will respond to further finetuning, though this was done on a small dataset.
17
+
18
+ The goal was to make a "mid" sized model like Meta has released in the past and the merge method was inspired by [mlabonne's Llama-3-120B](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct).
19
+
20
+ The model was finetuned on **8192 context length** and is likely reliable using RoPE up to 32k.
21
+
22
+ Further finetuning this model or finetuning the [base model](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed) on more samples is encouraged.
23
 
24
  ## Datasets
25
 
26
  * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
27
 
28
+ A small, high quality, dataset was used as a PoC / validation on stabilizing the model after finetuning.
29
+
30
  ## Finetuning details
31
  This is a QLoRA model and all modules were targeted.
32
  ```yaml
 
35
  - o_proj
36
  ```
37
 
38
+ The model is coherent even with training the "zeroed" loayers and can write well. In the next experiment, all layers will be finetuned as this was
39
+ the recommendation from [Charles Goddard] - thank you for the method of merging!
40
+
41
  ```yaml
42
  The following hyperparameters were used during training:
43
  - learning_rate: 1e-05