Mayfull commited on
Commit
f7b548b
1 Parent(s): 7eb7884

upload model

Browse files
README.md DELETED
@@ -1,121 +0,0 @@
1
- ---
2
- tags:
3
- - transformers
4
- license: cc-by-nc-4.0
5
- pipeline_tag: feature-extraction
6
- language:
7
- - en
8
- ---
9
-
10
- <h1 align="center">Linq-AI-Research/Linq-Embed-Mistral</h1>
11
-
12
- **Linq-Embed-Mistral**
13
-
14
- Linq-Embed-Mistral has been developed by building upon the foundations of the [E5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) models. We focus on improving text retrieval using advanced data refinement methods, including sophisticated data crafting, data filtering, and negative mining techniques. These methods are applied to both existing benchmark datasets and highly tailored synthetic datasets generated via LLMs. To enhance the quality of the synthetic data, we employ extensive prompt engineering and guidance from teacher models, ensuring these methods are specifically tailored to each task. Our efforts primarily aim to create high-quality triplet datasets (query, positive example, negative example), significantly improving text retrieval performance.
15
-
16
- Linq-Embed-Mistral performs exceptionally well in the MTEB benchmarks, achieving an average score of 68.1 across 56 datasets. This performance ranks it 1st among publicly accessible models on the MTEB leaderboard and 3rd overall among all evaluated models. The model excels in retrieval tasks, ranking 1st among all models listed on the MTEB leaderboard with a performance score of 60.0.
17
-
18
- This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses. Please refer to specific papers for more details:
19
-
20
- - [MTEB benchmark](https://arxiv.org/abs/2210.07316)
21
- - [Mistral](https://arxiv.org/abs/2310.06825)
22
- - [E5-mistral-7b-instruct](https://arxiv.org/pdf/2401.00368.pdf)
23
-
24
- For more details, refer to [this blog post](https://getlinq.com/blog/linq-embed-mistral/).
25
-
26
- ## How to use
27
-
28
- ### Transformers
29
- Here is an example of how to encode queries and passages from the Mr.TyDi training dataset.
30
- ```python
31
- import torch
32
- import torch.nn.functional as F
33
- from torch import Tensor
34
- from transformers import AutoTokenizer, AutoModel
35
-
36
- def last_token_pool(last_hidden_states: Tensor,
37
- attention_mask: Tensor) -> Tensor:
38
- left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
39
- if left_padding:
40
- return last_hidden_states[:, -1]
41
- else:
42
- sequence_lengths = attention_mask.sum(dim=1) - 1
43
- batch_size = last_hidden_states.shape[0]
44
- return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
45
-
46
- def get_detailed_instruct(task_description: str, query: str) -> str:
47
- return f'Instruct: {task_description}\nQuery: {query}'
48
-
49
- # Each query must come with a one-sentence instruction that describes the task
50
- task = 'Given a question, retrieve Wikipedia passages that answer the question'
51
- queries = [
52
- get_detailed_instruct(task, '최초의 원자력 발전소는 무엇인가?'),
53
- get_detailed_instruct(task, 'Who invented Hangul?')
54
- ]
55
- # No need to add instruction for retrieval documents
56
- passages = [
57
- "현재 사용되는 핵분열 방식을 이용한 전력생산은 1948년 9월 미국 테네시주 오크리지에 설치된 X-10 흑연원자로에서 전구의 불을 밝히는 데 사용되면서 시작되었다. 그리고 1954년 6월에 구소련의 오브닌스크에 건설된 흑연감속 비등경수 압력관형 원자로를 사용한 오브닌스크 원자력 발전소가 시험적으로 전력생산을 시작하였고, 최초의 상업용 원자력 엉더이로를 사용한 영국 셀라필드 원자력 단지에 위치한 콜더 홀(Calder Hall) 원자력 발전소로, 1956년 10월 17일 상업 운전을 시작하였다.",
58
- "Hangul was personally created and promulgated by the fourth king of the Joseon dynasty, Sejong the Great.[1][2] Sejong's scholarly institute, the Hall of Worthies, is often credited with the work, and at least one of its scholars was heavily involved in its creation, but it appears to have also been a personal project of Sejong."
59
- ]
60
-
61
- # Load model and tokenizer
62
- tokenizer = AutoTokenizer.from_pretrained('Salesforce/SFR-Embedding-Mistral')
63
- model = AutoModel.from_pretrained('Salesforce/SFR-Embedding-Mistral')
64
- input_texts = [*queries, *passages]
65
- max_length = 4096
66
- # Tokenize the input texts
67
- batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors="pt")
68
- outputs = model(**batch_dict)
69
- embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
70
-
71
- # Normalize embeddings
72
- embeddings = F.normalize(embeddings, p=2, dim=1)
73
- scores = (embeddings[:2] @ embeddings[2:].T) * 100
74
- print(scores.tolist())
75
- # [[86.7153549194336, 36.64569091796875], [35.00493621826172, 82.0738525390625]]
76
- ```
77
-
78
- ### MTEB Benchmark Evaluation
79
- Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to reproduce evaluation results on the [BEIR](https://arxiv.org/abs/2104.08663) and [MTEB](https://arxiv.org/abs/2210.07316) benchmark.
80
-
81
- ## Evaluation Result
82
-
83
- ### MTEB
84
-
85
-
86
- | Model Name | Retrieval (15) | Average (56) |
87
- | :------------------------------------------------------------------------------: | :------------: | :----------: |
88
- | [Linq-Embed-Mistral](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral) | **60.0** | 68.1 |
89
- | [NV-Embed-v1](https://huggingface.co/nvidia/NV-Embed-v1) | 59.4 | 69.3 |
90
- | [SFR-Embedding-Mistral](https://huggingface.co/Salesforce/SFR-Embedding-Mistral) | 59.0 | 67.6 |
91
- | [voyage-large-2-instruct]() | 58.3 | 68.3 |
92
- | [GritLM-7B]() | 57.4 | 66.8 |
93
- | voyage-lite-02-instruct | 56.6 | 67.1 |
94
- |[gte-Qwen1.5-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| 56.2 | 67.3 |
95
- | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | 56.9 | 66.6 |
96
- | google-gecko.text-embedding-preview-0409 | 55.7 | 66.3 |
97
- | text-embedding-3-large | 55.4 | 64.6 |
98
- | Cohere-embed-english-v3.0 | 55.0 | 64.5 |
99
-
100
- Linq Research Team.
101
-
102
- - Junseong Kim
103
- - Seolhwa Lee
104
- - Jihoon Kwon
105
- - Sangmo Gu
106
- - Yejin Kim
107
- - Minkyung Cho
108
- - Jy-yong Sohn
109
- - Chanyeol Choi
110
-
111
- ### Citation
112
-
113
- ```bibtex
114
- @misc{LinqAIResearch2024,
115
- title={Linq-Embed-Mistral:Elevating Text Retrieval with Improved GPT Data Through Task-Specific Control and Quality Refinement},
116
- author={Junseong Kim, Seolhwa Lee, Jihoon Kwon, Sangmo Gu, Yejin Kim, Minkyung Cho, Jy-yong Sohn, Chanyeol Choi},
117
- howpublished={Linq AI Research Blog},
118
- year={2024},
119
- url={https://getlinq.com/blog/linq-embed-mistral/}
120
- }
121
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "intfloat/e5-mistral-7b-instruct",
3
+ "architectures": [
4
+ "MistralModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 2,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 10000.0,
21
+ "sliding_window": 4096,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "float16",
24
+ "transformers_version": "4.39.0",
25
+ "use_cache": false,
26
+ "vocab_size": 32000
27
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72a2e78bd39844eb2c4041525f8742ffddcceaf594082dadb7bd4085a62b3e3a
3
+ size 4943161664
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff209caf16ced9cd45770a8cdeed50137a778569d6ce2908dc59e1c8aad26fb8
3
+ size 4999818600
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5d64289b8791d2a13c4a5bbfb5246996c7f0f10ddaeaa8ae070ace37d68f33a
3
+ size 4278371624
model.safetensors.index.json ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14221320192
4
+ },
5
+ "weight_map": {
6
+ "embed_tokens.weight": "model-00001-of-00003.safetensors",
7
+ "layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
8
+ "layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
9
+ "layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
10
+ "layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
11
+ "layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
12
+ "layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
13
+ "layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
14
+ "layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
15
+ "layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
16
+ "layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
17
+ "layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
18
+ "layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
19
+ "layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
20
+ "layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
21
+ "layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
22
+ "layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
23
+ "layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
24
+ "layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
25
+ "layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
26
+ "layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
27
+ "layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
28
+ "layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
29
+ "layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
30
+ "layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
31
+ "layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
32
+ "layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
33
+ "layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
34
+ "layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
35
+ "layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
36
+ "layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
37
+ "layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
38
+ "layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
39
+ "layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
40
+ "layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
41
+ "layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
42
+ "layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
43
+ "layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
44
+ "layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
45
+ "layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
46
+ "layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
47
+ "layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
48
+ "layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
49
+ "layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
50
+ "layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
51
+ "layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
52
+ "layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
53
+ "layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
54
+ "layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
55
+ "layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
56
+ "layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
57
+ "layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
58
+ "layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
59
+ "layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
60
+ "layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
61
+ "layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
62
+ "layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
63
+ "layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
64
+ "layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
65
+ "layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
66
+ "layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
67
+ "layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
68
+ "layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
69
+ "layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
70
+ "layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
71
+ "layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
72
+ "layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
73
+ "layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
74
+ "layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
75
+ "layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
76
+ "layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
77
+ "layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
78
+ "layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
79
+ "layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
80
+ "layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
81
+ "layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
82
+ "layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
83
+ "layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
84
+ "layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
85
+ "layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
86
+ "layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
87
+ "layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
88
+ "layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
89
+ "layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
90
+ "layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
91
+ "layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
92
+ "layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
93
+ "layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
94
+ "layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
95
+ "layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
96
+ "layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
97
+ "layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
98
+ "layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
99
+ "layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
100
+ "layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
101
+ "layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
102
+ "layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
103
+ "layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
104
+ "layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
105
+ "layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
106
+ "layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
107
+ "layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
108
+ "layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
109
+ "layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
110
+ "layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
111
+ "layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
112
+ "layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
113
+ "layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
114
+ "layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
115
+ "layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
116
+ "layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
117
+ "layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
118
+ "layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
119
+ "layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
120
+ "layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
121
+ "layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
122
+ "layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
123
+ "layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
124
+ "layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
125
+ "layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
126
+ "layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
127
+ "layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
128
+ "layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
129
+ "layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
130
+ "layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
131
+ "layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
132
+ "layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
133
+ "layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
134
+ "layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
135
+ "layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
136
+ "layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
137
+ "layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
138
+ "layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
139
+ "layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
140
+ "layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
141
+ "layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
142
+ "layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
143
+ "layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
144
+ "layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
145
+ "layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
146
+ "layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
147
+ "layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
148
+ "layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
149
+ "layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
150
+ "layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
151
+ "layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
152
+ "layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
153
+ "layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
154
+ "layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
155
+ "layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
156
+ "layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
157
+ "layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
158
+ "layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
159
+ "layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
160
+ "layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
161
+ "layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
162
+ "layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
163
+ "layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
164
+ "layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
165
+ "layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
166
+ "layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
167
+ "layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
168
+ "layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
169
+ "layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
170
+ "layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
171
+ "layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
172
+ "layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
173
+ "layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
174
+ "layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
175
+ "layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
176
+ "layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
177
+ "layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
178
+ "layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
179
+ "layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
180
+ "layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
181
+ "layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
182
+ "layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
183
+ "layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
184
+ "layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
185
+ "layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
186
+ "layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
187
+ "layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
188
+ "layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
189
+ "layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
190
+ "layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
191
+ "layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
192
+ "layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
193
+ "layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
194
+ "layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
195
+ "layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
196
+ "layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
197
+ "layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
198
+ "layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
199
+ "layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
200
+ "layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
201
+ "layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
202
+ "layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
203
+ "layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
204
+ "layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
205
+ "layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
206
+ "layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
207
+ "layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
208
+ "layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
209
+ "layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
210
+ "layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
211
+ "layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
212
+ "layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
213
+ "layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
214
+ "layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
215
+ "layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
216
+ "layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
217
+ "layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
218
+ "layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
219
+ "layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
220
+ "layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
221
+ "layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
222
+ "layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
223
+ "layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
224
+ "layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
225
+ "layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
226
+ "layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
227
+ "layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
228
+ "layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
229
+ "layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
230
+ "layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
231
+ "layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
232
+ "layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
233
+ "layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
234
+ "layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
235
+ "layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
236
+ "layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
237
+ "layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
238
+ "layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
239
+ "layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
240
+ "layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
241
+ "layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
242
+ "layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
243
+ "layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
244
+ "layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
245
+ "layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
246
+ "layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
247
+ "layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
248
+ "layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
249
+ "layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
250
+ "layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
251
+ "layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
252
+ "layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
253
+ "layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
254
+ "layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
255
+ "layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
256
+ "layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
257
+ "layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
258
+ "layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
259
+ "layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
260
+ "layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
261
+ "layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
262
+ "layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
263
+ "layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
264
+ "layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
265
+ "layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
266
+ "layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
267
+ "layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
268
+ "layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
269
+ "layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
270
+ "layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
271
+ "layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
272
+ "layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
273
+ "layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
274
+ "layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
275
+ "layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
276
+ "layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
277
+ "layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
278
+ "layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
279
+ "layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
280
+ "layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
281
+ "layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
282
+ "layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
283
+ "layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
284
+ "layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
285
+ "layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
286
+ "layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
287
+ "layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
288
+ "layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
289
+ "layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
290
+ "layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
291
+ "layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
292
+ "layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
293
+ "layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
294
+ "layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
295
+ "norm.weight": "model-00003-of-00003.safetensors"
296
+ }
297
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": {
8
+ "content": "<s>",
9
+ "lstrip": false,
10
+ "normalized": false,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "eos_token": {
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "pad_token": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "unk_token": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ }
35
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": true,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "clean_up_tokenization_spaces": false,
37
+ "eos_token": "</s>",
38
+ "legacy": true,
39
+ "model_max_length": 1000000000000000019884624838656,
40
+ "pad_token": "</s>",
41
+ "sp_model_kwargs": {},
42
+ "spaces_between_special_tokens": false,
43
+ "tokenizer_class": "LlamaTokenizer",
44
+ "unk_token": "<unk>",
45
+ "use_default_system_prompt": false
46
+ }