Update README.md
Browse files
README.md
CHANGED
@@ -1,22 +1,32 @@
|
|
1 |
---
|
2 |
base_model:
|
3 |
-
- elinas/Llama-3-15B-Instruct
|
4 |
library_name: transformers
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
license: llama3
|
9 |
---
|
10 |
-
# Llama-3-15B-Instruct-ft
|
11 |
|
12 |
-
This is a QLoRA **finetune** of a merge of pre-trained language models created using
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
## Datasets
|
17 |
|
18 |
* [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
|
19 |
|
|
|
|
|
20 |
## Finetuning details
|
21 |
This is a QLoRA model and all modules were targeted.
|
22 |
```yaml
|
@@ -25,6 +35,9 @@ lora_target_modules:
|
|
25 |
- o_proj
|
26 |
```
|
27 |
|
|
|
|
|
|
|
28 |
```yaml
|
29 |
The following hyperparameters were used during training:
|
30 |
- learning_rate: 1e-05
|
|
|
1 |
---
|
2 |
base_model:
|
3 |
+
- elinas/Llama-3-15B-Instruct-zeroed
|
4 |
library_name: transformers
|
5 |
tags:
|
6 |
- mergekit
|
7 |
- merge
|
8 |
license: llama3
|
9 |
---
|
10 |
+
# Llama-3-15B-Instruct-ft-zeroed
|
11 |
|
12 |
+
This is a QLoRA **finetune** of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
13 |
|
14 |
+
The model is based on a "zeroed" passthrough merge of [Llama-3-13B-Instruct](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed)
|
15 |
+
|
16 |
+
This was primarily an experiment to see how a passthrough merge will respond to further finetuning, though this was done on a small dataset.
|
17 |
+
|
18 |
+
The goal was to make a "mid" sized model like Meta has released in the past and the merge method was inspired by [mlabonne's Llama-3-120B](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct).
|
19 |
+
|
20 |
+
The model was finetuned on **8192 context length** and is likely reliable using RoPE up to 32k.
|
21 |
+
|
22 |
+
Further finetuning this model or finetuning the [base model](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed) on more samples is encouraged.
|
23 |
|
24 |
## Datasets
|
25 |
|
26 |
* [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
|
27 |
|
28 |
+
A small, high quality, dataset was used as a PoC / validation on stabilizing the model after finetuning.
|
29 |
+
|
30 |
## Finetuning details
|
31 |
This is a QLoRA model and all modules were targeted.
|
32 |
```yaml
|
|
|
35 |
- o_proj
|
36 |
```
|
37 |
|
38 |
+
The model is coherent even with training the "zeroed" loayers and can write well. In the next experiment, all layers will be finetuned as this was
|
39 |
+
the recommendation from [Charles Goddard] - thank you for the method of merging!
|
40 |
+
|
41 |
```yaml
|
42 |
The following hyperparameters were used during training:
|
43 |
- learning_rate: 1e-05
|