Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: yi-license
|
4 |
+
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
|
5 |
+
language:
|
6 |
+
- en,
|
7 |
+
pipeline_tag: conversational
|
8 |
+
---
|
9 |
+
<p align="center">
|
10 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/PWn9I-0XH7kSP_YXcyxIg.png" width="400"/>
|
11 |
+
</p>
|
12 |
+
|
13 |
+
---
|
14 |
+
|
15 |
+
# SG Raccoon 55B
|
16 |
+
|
17 |
+
The first 55B auto-regressive causal LM created by combining 2x finetuned llamafied [Yi 34b](https://huggingface.co/01-ai/Yi-34B) with *200K context* into one.
|
18 |
+
|
19 |
+
|
20 |
+
# Prompting Format
|
21 |
+
|
22 |
+
```
|
23 |
+
SYSTEM: <ANY SYSTEM CONTEXT>
|
24 |
+
USER:
|
25 |
+
ASSISTANT:
|
26 |
+
```
|
27 |
+
|
28 |
+
# Merge process
|
29 |
+
|
30 |
+
The models used in the merge are [Tess-M-v1.3](https://huggingface.co/migtissera/Tess-M-v1.3/) and [airoboros-3_1-yi-34b-200k](bhenrym14/airoboros-3_1-yi-34b-200k).
|
31 |
+
|
32 |
+
The layer ranges used are as follows:
|
33 |
+
|
34 |
+
```yaml
|
35 |
+
- model: bhenrym14/airoboros-3_1-yi-34b-200k
|
36 |
+
layer_range: [0, 14]
|
37 |
+
- model: migtissera/Tess-M-v1.3
|
38 |
+
layer_range: [7, 21]
|
39 |
+
- model: bhenrym14/airoboros-3_1-yi-34b-200k
|
40 |
+
layer_range: [15, 29]
|
41 |
+
- model: migtissera/Tess-M-v1.3
|
42 |
+
layer_range: [22, 36]
|
43 |
+
- model: bhenrym14/airoboros-3_1-yi-34b-200k
|
44 |
+
layer_range: [30, 44]
|
45 |
+
- model: migtissera/Tess-M-v1.3
|
46 |
+
layer_range: [37, 51]
|
47 |
+
- model: bhenrym14/airoboros-3_1-yi-34b-200k
|
48 |
+
layer_range: [45, 59]
|
49 |
+
```
|
50 |
+
|
51 |
+
# Tips
|
52 |
+
|
53 |
+
Being a Yi model, try disabling the BOS token and/or running a lower temperature with MinP (and no other samplers) if output doesn't seem right. Yi tends to run "hot" by default.
|
54 |
+
|
55 |
+
Sometimes the model "spells out" the stop token as </s> like Capybara, so you may need to add </s> as an additional stopping condition.
|
56 |
+
|
57 |
+
|
58 |
+
# Benchmarks
|
59 |
+
Coming soon.
|
60 |
+
|
61 |
+
# Acknowledgements
|
62 |
+
- Special thanks to [MSS](https://milanosamplesale.com/) for sponsoring this project
|
63 |
+
|
64 |
+
- [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
|
65 |
+
|
66 |
+
- Great thanks to [@Undi95](https://huggingface.co/Undi95) for helping figuring out model merge options
|
67 |
+
|
68 |
+
- Also credits to the [01-ai](https://huggingface.co/01-ai) team for their amazing models
|
69 |
+
|
70 |
+
- This merged model is inspired by [Goliath 120B](https://huggingface.co/alpindale/goliath-120b)
|