mpasila commited on
Commit
16a915a
1 Parent(s): 73844d1

Added weights

Browse files
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: freeuse
4
+ license_link: LICENSE
5
+ tags:
6
+ - not-for-all-audiences
7
+ ---
8
+ This is an EXL2 quantized model in 4bpw of [TheDrummer/Moistral-11B-v1](https://huggingface.co/TheDrummer/Moistral-11B-v1) using the default calibration dataset.
9
+
10
+ # Original Model card:
11
+
12
+ # Moistral 11B v1 💦
13
+
14
+ *An eRP model with a rich vocabulary of moist. Finetuned by yours truly.*
15
+
16
+ GGUF: https://huggingface.co/TheDrummer/Moistral-11B-v1-GGUF/tree/main
17
+
18
+ ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/JLzCB9m6U-8rICy7ZRQPx.gif)
19
+
20
+ ## Description
21
+
22
+ Moistral 11B is an *aggressive* **finetune** of the legendary [Fimbulvert v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) model.
23
+
24
+ If you haven't heard of that model, then your poor-GPU ass is missing out!
25
+ It is an amazing substitute for those heavyweight models, and it won't disappoint.
26
+ *(That's coming from someone who daily drives a Venus / Goliath 120B)*
27
+
28
+ ...and if you're looking for something *more moist*, then you've come to the right place.
29
+
30
+ You can expect the same sort of performance here! It's the Honda Civic of eRP models!
31
+
32
+ ### Pimp my ride
33
+ Turn any old story into a **Moist**ral masterpiece.
34
+ In fact, we recommend that you leave out any hints of *moist* in your story to _let Moistral cook_.
35
+
36
+ ### Drive it like it's hot
37
+ When in Instruct Mode, Moistral will treat you like a director of your own fantasy ride.
38
+ Be a controlling navigator or a passenger princess, it won't matter - you're in for a good ride!
39
+
40
+ <details>
41
+ <summary>Moistral Sample</summary>
42
+ <b>ASSISTANT:</b><br/>
43
+ When the doors to the throne room finally opened, she saw him there - Dik, the sorcerer prince sitting on his throne. He looked majestic, just as Princess Lilia expected. In a robe embroidered with silver yarn and covered with magical runes. His courtiers stood on either side of him, all looking at the princess with interest. Princess Lilia felt her determination leaving her, but pulled herself together and once again reminded herself that she should not be afraid. She approached the throne intending to show the dignity of all Iliac by personal example.<br/>
44
+ <b>USER:</b><br/>
45
+ She bows slightly<br/>
46
+ <b>ASSISTANT:</b><br/>
47
+ Princess Lilia bowed slightly in front of Dik, the sorcerer prince. Her eyes fell to the floor out of respect, but soon rose again. She would show herself to be a dignified woman!<br/>
48
+ "I greet you, your highness," Princess Lilia said. "I am here on behalf of my father, King Edmund Ruhr, to serve as a hostage and envoy."<br/>
49
+ Dik nodded, but didn't say anything.<br/>
50
+ </details>
51
+
52
+ ### It's got the mileage
53
+ Moistral has been trained with many long form texts, a nice chunk of which are 8K in length.
54
+ It is capable of going far and long without passing it back to you. This is not your typical chibi RP model.
55
+
56
+ ### Parameters
57
+ If Moistral starts to underperform and token spit, I've noticed that lowering the parameters makes it coherent again. Here's what worked for me:
58
+ ```yaml
59
+ temperature: 0.66
60
+ repetition_penalty: 1.1
61
+ top_p: 0.64
62
+ rp_slp: 1
63
+ ```
64
+ I encourage you to play around with the parameters yourself to see what works for you.
65
+
66
+ ## What's next?
67
+ Moistral 11B is my first attempt at finetuning a capable model (Sorry, CreamPhi-2).
68
+ It's coherent and creative enough to let me understand the impact of my dataset & training.
69
+ Playing around with it has already given me a better idea on the Do's and Don'ts.
70
+ I will most likely make a version 2 with some improvements:
71
+ 1. Remove any glitchy texts that come from my dataset. Sanitize sanitize sanitize!
72
+ 2. Balance out the themes in the dataset for a richer, more diverse experience.
73
+ 3. Consider extending the context window.
74
+ 4. Add a 'monologue' dataset that forces the model to keep talking without much interaction from the `user`.
75
+ 5. Maybe, just maybe, expose it to dry stuff to let Moistral cook.
76
+
77
+ GGUF: https://huggingface.co/TheDrummer/Moistral-11B-v1-GGUF/tree/main
78
+
79
+ I have to acknowledge that I'm standing on the shoulders of giants.
80
+ Thank you Sao for sharing your finetune config along with tips on getting started.
81
+ Thanks to everyone in the Finetuning channel for entertaining my every question.
82
+
83
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/Ll8CA5RR7ugTi72P2HBb8.png)
84
+
85
+ ---
86
+ license: other
87
+ license_name: freeuse
88
+ license_link: LICENSE
89
+ ---
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Sao10K/Fimbulvetr-11B-v2",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 8192,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 48,
18
+ "num_key_value_heads": 8,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": null,
22
+ "rope_theta": 10000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.38.2",
26
+ "use_cache": false,
27
+ "vocab_size": 32000,
28
+ "quantization_config": {
29
+ "quant_method": "exl2",
30
+ "version": "0.0.16",
31
+ "bits": 4.0,
32
+ "head_bits": 6,
33
+ "calibration": {
34
+ "rows": 100,
35
+ "length": 2048,
36
+ "dataset": "(default)"
37
+ }
38
+ }
39
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "transformers_version": "4.38.2",
7
+ "use_cache": false
8
+ }
job_new.json ADDED
The diff for this file is too large to render. See raw diff
 
measurement.json ADDED
The diff for this file is too large to render. See raw diff
 
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee4fb164e5134201d499ceb4708f85943d13713bba344117483054929ba8bc36
3
+ size 5601411052
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": true,
43
+ "use_fast": true
44
+ }