xezpeleta commited on
Commit
4088a0a
1 Parent(s): 507e49c

Added script to convert the checkpoint to ggml model

Browse files
Whisper_finetuned_checkpoint_to_GGML.ipynb ADDED
@@ -0,0 +1,1381 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": []
7
+ },
8
+ "kernelspec": {
9
+ "name": "python3",
10
+ "display_name": "Python 3"
11
+ },
12
+ "language_info": {
13
+ "name": "python"
14
+ }
15
+ },
16
+ "cells": [
17
+ {
18
+ "cell_type": "markdown",
19
+ "source": [
20
+ "# Convert a HF finetuned Whisper model to GGML\n",
21
+ "\n",
22
+ "Reference: https://github.com/ggerganov/whisper.cpp/tree/master/models#fine-tuned-models"
23
+ ],
24
+ "metadata": {
25
+ "id": "nZPl81t1Ruvk"
26
+ }
27
+ },
28
+ {
29
+ "cell_type": "code",
30
+ "execution_count": 3,
31
+ "metadata": {
32
+ "colab": {
33
+ "base_uri": "https://localhost:8080/"
34
+ },
35
+ "id": "jzgovx6mRpHc",
36
+ "outputId": "d95a18f3-579e-427a-d904-3976ecd6d896"
37
+ },
38
+ "outputs": [
39
+ {
40
+ "output_type": "stream",
41
+ "name": "stdout",
42
+ "text": [
43
+ "Reading package lists... Done\n",
44
+ "Building dependency tree \n",
45
+ "Reading state information... Done\n",
46
+ "git-lfs is already the newest version (2.9.2-1).\n",
47
+ "0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.\n",
48
+ "fatal: destination path 'whisper' already exists and is not an empty directory.\n",
49
+ "fatal: destination path 'whisper.cpp' already exists and is not an empty directory.\n",
50
+ "fatal: destination path 'whisper-small-eu-v2' already exists and is not an empty directory.\n"
51
+ ]
52
+ }
53
+ ],
54
+ "source": [
55
+ "# Download the repos\n",
56
+ "!git clone https://github.com/openai/whisper\n",
57
+ "!git clone https://github.com/ggerganov/whisper.cpp\n",
58
+ "\n",
59
+ "# clone HF fine-tuned model (this is just an example)\n",
60
+ "!git clone https://huggingface.co/xezpeleta/whisper-small-eu-v2"
61
+ ]
62
+ },
63
+ {
64
+ "cell_type": "code",
65
+ "source": [
66
+ "# Install required packages\n",
67
+ "!pip install transformers"
68
+ ],
69
+ "metadata": {
70
+ "colab": {
71
+ "base_uri": "https://localhost:8080/"
72
+ },
73
+ "id": "lncO4nydT0xI",
74
+ "outputId": "f81184f4-7168-42a5-97df-d29b3ee7ac0c"
75
+ },
76
+ "execution_count": 6,
77
+ "outputs": [
78
+ {
79
+ "output_type": "stream",
80
+ "name": "stdout",
81
+ "text": [
82
+ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
83
+ "Collecting transformers\n",
84
+ " Downloading transformers-4.27.4-py3-none-any.whl (6.8 MB)\n",
85
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.8/6.8 MB\u001b[0m \u001b[31m84.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
86
+ "\u001b[?25hRequirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.0)\n",
87
+ "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.22.4)\n",
88
+ "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.27.1)\n",
89
+ "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
90
+ " Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)\n",
91
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.6/7.6 MB\u001b[0m \u001b[31m88.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
92
+ "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.10.7)\n",
93
+ "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)\n",
94
+ "Collecting huggingface-hub<1.0,>=0.11.0\n",
95
+ " Downloading huggingface_hub-0.13.3-py3-none-any.whl (199 kB)\n",
96
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m199.8/199.8 KB\u001b[0m \u001b[31m21.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
97
+ "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.10.31)\n",
98
+ "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)\n",
99
+ "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
100
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (1.26.15)\n",
101
+ "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2.0.12)\n",
102
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)\n",
103
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (3.4)\n",
104
+ "Installing collected packages: tokenizers, huggingface-hub, transformers\n",
105
+ "Successfully installed huggingface-hub-0.13.3 tokenizers-0.13.2 transformers-4.27.4\n"
106
+ ]
107
+ }
108
+ ]
109
+ },
110
+ {
111
+ "cell_type": "code",
112
+ "source": [
113
+ "# Convert the model to ggml\n",
114
+ "!python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-small-eu-v2/ ./whisper ."
115
+ ],
116
+ "metadata": {
117
+ "colab": {
118
+ "base_uri": "https://localhost:8080/"
119
+ },
120
+ "id": "uIkTQr8yTfWP",
121
+ "outputId": "ce904702-5317-48a5-9f3b-2f0c2ba126ef"
122
+ },
123
+ "execution_count": 7,
124
+ "outputs": [
125
+ {
126
+ "output_type": "stream",
127
+ "name": "stdout",
128
+ "text": [
129
+ "model.encoder.conv1.weight -> encoder.conv1.weight\n",
130
+ "encoder.conv1.weight 3 (768, 80, 3)\n",
131
+ "model.encoder.conv1.bias -> encoder.conv1.bias\n",
132
+ " Reshaped variable: encoder.conv1.bias to shape: (768, 1)\n",
133
+ "encoder.conv1.bias 2 (768, 1)\n",
134
+ " Converting to float32\n",
135
+ "model.encoder.conv2.weight -> encoder.conv2.weight\n",
136
+ "encoder.conv2.weight 3 (768, 768, 3)\n",
137
+ "model.encoder.conv2.bias -> encoder.conv2.bias\n",
138
+ " Reshaped variable: encoder.conv2.bias to shape: (768, 1)\n",
139
+ "encoder.conv2.bias 2 (768, 1)\n",
140
+ " Converting to float32\n",
141
+ "model.encoder.embed_positions.weight -> encoder.positional_embedding\n",
142
+ "encoder.positional_embedding 2 (1500, 768)\n",
143
+ " Converting to float32\n",
144
+ "model.encoder.layers.0.self_attn.k_proj.weight -> encoder.blocks.0.attn.key.weight\n",
145
+ "encoder.blocks.0.attn.key.weight 2 (768, 768)\n",
146
+ "model.encoder.layers.0.self_attn.v_proj.weight -> encoder.blocks.0.attn.value.weight\n",
147
+ "encoder.blocks.0.attn.value.weight 2 (768, 768)\n",
148
+ "model.encoder.layers.0.self_attn.v_proj.bias -> encoder.blocks.0.attn.value.bias\n",
149
+ "encoder.blocks.0.attn.value.bias 1 (768,)\n",
150
+ " Converting to float32\n",
151
+ "model.encoder.layers.0.self_attn.q_proj.weight -> encoder.blocks.0.attn.query.weight\n",
152
+ "encoder.blocks.0.attn.query.weight 2 (768, 768)\n",
153
+ "model.encoder.layers.0.self_attn.q_proj.bias -> encoder.blocks.0.attn.query.bias\n",
154
+ "encoder.blocks.0.attn.query.bias 1 (768,)\n",
155
+ " Converting to float32\n",
156
+ "model.encoder.layers.0.self_attn.out_proj.weight -> encoder.blocks.0.attn.out.weight\n",
157
+ "encoder.blocks.0.attn.out.weight 2 (768, 768)\n",
158
+ "model.encoder.layers.0.self_attn.out_proj.bias -> encoder.blocks.0.attn.out.bias\n",
159
+ "encoder.blocks.0.attn.out.bias 1 (768,)\n",
160
+ " Converting to float32\n",
161
+ "model.encoder.layers.0.self_attn_layer_norm.weight -> encoder.blocks.0.attn_ln.weight\n",
162
+ "encoder.blocks.0.attn_ln.weight 1 (768,)\n",
163
+ " Converting to float32\n",
164
+ "model.encoder.layers.0.self_attn_layer_norm.bias -> encoder.blocks.0.attn_ln.bias\n",
165
+ "encoder.blocks.0.attn_ln.bias 1 (768,)\n",
166
+ " Converting to float32\n",
167
+ "model.encoder.layers.0.fc1.weight -> encoder.blocks.0.mlp.0.weight\n",
168
+ "encoder.blocks.0.mlp.0.weight 2 (3072, 768)\n",
169
+ "model.encoder.layers.0.fc1.bias -> encoder.blocks.0.mlp.0.bias\n",
170
+ "encoder.blocks.0.mlp.0.bias 1 (3072,)\n",
171
+ " Converting to float32\n",
172
+ "model.encoder.layers.0.fc2.weight -> encoder.blocks.0.mlp.2.weight\n",
173
+ "encoder.blocks.0.mlp.2.weight 2 (768, 3072)\n",
174
+ "model.encoder.layers.0.fc2.bias -> encoder.blocks.0.mlp.2.bias\n",
175
+ "encoder.blocks.0.mlp.2.bias 1 (768,)\n",
176
+ " Converting to float32\n",
177
+ "model.encoder.layers.0.final_layer_norm.weight -> encoder.blocks.0.mlp_ln.weight\n",
178
+ "encoder.blocks.0.mlp_ln.weight 1 (768,)\n",
179
+ " Converting to float32\n",
180
+ "model.encoder.layers.0.final_layer_norm.bias -> encoder.blocks.0.mlp_ln.bias\n",
181
+ "encoder.blocks.0.mlp_ln.bias 1 (768,)\n",
182
+ " Converting to float32\n",
183
+ "model.encoder.layers.1.self_attn.k_proj.weight -> encoder.blocks.1.attn.key.weight\n",
184
+ "encoder.blocks.1.attn.key.weight 2 (768, 768)\n",
185
+ "model.encoder.layers.1.self_attn.v_proj.weight -> encoder.blocks.1.attn.value.weight\n",
186
+ "encoder.blocks.1.attn.value.weight 2 (768, 768)\n",
187
+ "model.encoder.layers.1.self_attn.v_proj.bias -> encoder.blocks.1.attn.value.bias\n",
188
+ "encoder.blocks.1.attn.value.bias 1 (768,)\n",
189
+ " Converting to float32\n",
190
+ "model.encoder.layers.1.self_attn.q_proj.weight -> encoder.blocks.1.attn.query.weight\n",
191
+ "encoder.blocks.1.attn.query.weight 2 (768, 768)\n",
192
+ "model.encoder.layers.1.self_attn.q_proj.bias -> encoder.blocks.1.attn.query.bias\n",
193
+ "encoder.blocks.1.attn.query.bias 1 (768,)\n",
194
+ " Converting to float32\n",
195
+ "model.encoder.layers.1.self_attn.out_proj.weight -> encoder.blocks.1.attn.out.weight\n",
196
+ "encoder.blocks.1.attn.out.weight 2 (768, 768)\n",
197
+ "model.encoder.layers.1.self_attn.out_proj.bias -> encoder.blocks.1.attn.out.bias\n",
198
+ "encoder.blocks.1.attn.out.bias 1 (768,)\n",
199
+ " Converting to float32\n",
200
+ "model.encoder.layers.1.self_attn_layer_norm.weight -> encoder.blocks.1.attn_ln.weight\n",
201
+ "encoder.blocks.1.attn_ln.weight 1 (768,)\n",
202
+ " Converting to float32\n",
203
+ "model.encoder.layers.1.self_attn_layer_norm.bias -> encoder.blocks.1.attn_ln.bias\n",
204
+ "encoder.blocks.1.attn_ln.bias 1 (768,)\n",
205
+ " Converting to float32\n",
206
+ "model.encoder.layers.1.fc1.weight -> encoder.blocks.1.mlp.0.weight\n",
207
+ "encoder.blocks.1.mlp.0.weight 2 (3072, 768)\n",
208
+ "model.encoder.layers.1.fc1.bias -> encoder.blocks.1.mlp.0.bias\n",
209
+ "encoder.blocks.1.mlp.0.bias 1 (3072,)\n",
210
+ " Converting to float32\n",
211
+ "model.encoder.layers.1.fc2.weight -> encoder.blocks.1.mlp.2.weight\n",
212
+ "encoder.blocks.1.mlp.2.weight 2 (768, 3072)\n",
213
+ "model.encoder.layers.1.fc2.bias -> encoder.blocks.1.mlp.2.bias\n",
214
+ "encoder.blocks.1.mlp.2.bias 1 (768,)\n",
215
+ " Converting to float32\n",
216
+ "model.encoder.layers.1.final_layer_norm.weight -> encoder.blocks.1.mlp_ln.weight\n",
217
+ "encoder.blocks.1.mlp_ln.weight 1 (768,)\n",
218
+ " Converting to float32\n",
219
+ "model.encoder.layers.1.final_layer_norm.bias -> encoder.blocks.1.mlp_ln.bias\n",
220
+ "encoder.blocks.1.mlp_ln.bias 1 (768,)\n",
221
+ " Converting to float32\n",
222
+ "model.encoder.layers.2.self_attn.k_proj.weight -> encoder.blocks.2.attn.key.weight\n",
223
+ "encoder.blocks.2.attn.key.weight 2 (768, 768)\n",
224
+ "model.encoder.layers.2.self_attn.v_proj.weight -> encoder.blocks.2.attn.value.weight\n",
225
+ "encoder.blocks.2.attn.value.weight 2 (768, 768)\n",
226
+ "model.encoder.layers.2.self_attn.v_proj.bias -> encoder.blocks.2.attn.value.bias\n",
227
+ "encoder.blocks.2.attn.value.bias 1 (768,)\n",
228
+ " Converting to float32\n",
229
+ "model.encoder.layers.2.self_attn.q_proj.weight -> encoder.blocks.2.attn.query.weight\n",
230
+ "encoder.blocks.2.attn.query.weight 2 (768, 768)\n",
231
+ "model.encoder.layers.2.self_attn.q_proj.bias -> encoder.blocks.2.attn.query.bias\n",
232
+ "encoder.blocks.2.attn.query.bias 1 (768,)\n",
233
+ " Converting to float32\n",
234
+ "model.encoder.layers.2.self_attn.out_proj.weight -> encoder.blocks.2.attn.out.weight\n",
235
+ "encoder.blocks.2.attn.out.weight 2 (768, 768)\n",
236
+ "model.encoder.layers.2.self_attn.out_proj.bias -> encoder.blocks.2.attn.out.bias\n",
237
+ "encoder.blocks.2.attn.out.bias 1 (768,)\n",
238
+ " Converting to float32\n",
239
+ "model.encoder.layers.2.self_attn_layer_norm.weight -> encoder.blocks.2.attn_ln.weight\n",
240
+ "encoder.blocks.2.attn_ln.weight 1 (768,)\n",
241
+ " Converting to float32\n",
242
+ "model.encoder.layers.2.self_attn_layer_norm.bias -> encoder.blocks.2.attn_ln.bias\n",
243
+ "encoder.blocks.2.attn_ln.bias 1 (768,)\n",
244
+ " Converting to float32\n",
245
+ "model.encoder.layers.2.fc1.weight -> encoder.blocks.2.mlp.0.weight\n",
246
+ "encoder.blocks.2.mlp.0.weight 2 (3072, 768)\n",
247
+ "model.encoder.layers.2.fc1.bias -> encoder.blocks.2.mlp.0.bias\n",
248
+ "encoder.blocks.2.mlp.0.bias 1 (3072,)\n",
249
+ " Converting to float32\n",
250
+ "model.encoder.layers.2.fc2.weight -> encoder.blocks.2.mlp.2.weight\n",
251
+ "encoder.blocks.2.mlp.2.weight 2 (768, 3072)\n",
252
+ "model.encoder.layers.2.fc2.bias -> encoder.blocks.2.mlp.2.bias\n",
253
+ "encoder.blocks.2.mlp.2.bias 1 (768,)\n",
254
+ " Converting to float32\n",
255
+ "model.encoder.layers.2.final_layer_norm.weight -> encoder.blocks.2.mlp_ln.weight\n",
256
+ "encoder.blocks.2.mlp_ln.weight 1 (768,)\n",
257
+ " Converting to float32\n",
258
+ "model.encoder.layers.2.final_layer_norm.bias -> encoder.blocks.2.mlp_ln.bias\n",
259
+ "encoder.blocks.2.mlp_ln.bias 1 (768,)\n",
260
+ " Converting to float32\n",
261
+ "model.encoder.layers.3.self_attn.k_proj.weight -> encoder.blocks.3.attn.key.weight\n",
262
+ "encoder.blocks.3.attn.key.weight 2 (768, 768)\n",
263
+ "model.encoder.layers.3.self_attn.v_proj.weight -> encoder.blocks.3.attn.value.weight\n",
264
+ "encoder.blocks.3.attn.value.weight 2 (768, 768)\n",
265
+ "model.encoder.layers.3.self_attn.v_proj.bias -> encoder.blocks.3.attn.value.bias\n",
266
+ "encoder.blocks.3.attn.value.bias 1 (768,)\n",
267
+ " Converting to float32\n",
268
+ "model.encoder.layers.3.self_attn.q_proj.weight -> encoder.blocks.3.attn.query.weight\n",
269
+ "encoder.blocks.3.attn.query.weight 2 (768, 768)\n",
270
+ "model.encoder.layers.3.self_attn.q_proj.bias -> encoder.blocks.3.attn.query.bias\n",
271
+ "encoder.blocks.3.attn.query.bias 1 (768,)\n",
272
+ " Converting to float32\n",
273
+ "model.encoder.layers.3.self_attn.out_proj.weight -> encoder.blocks.3.attn.out.weight\n",
274
+ "encoder.blocks.3.attn.out.weight 2 (768, 768)\n",
275
+ "model.encoder.layers.3.self_attn.out_proj.bias -> encoder.blocks.3.attn.out.bias\n",
276
+ "encoder.blocks.3.attn.out.bias 1 (768,)\n",
277
+ " Converting to float32\n",
278
+ "model.encoder.layers.3.self_attn_layer_norm.weight -> encoder.blocks.3.attn_ln.weight\n",
279
+ "encoder.blocks.3.attn_ln.weight 1 (768,)\n",
280
+ " Converting to float32\n",
281
+ "model.encoder.layers.3.self_attn_layer_norm.bias -> encoder.blocks.3.attn_ln.bias\n",
282
+ "encoder.blocks.3.attn_ln.bias 1 (768,)\n",
283
+ " Converting to float32\n",
284
+ "model.encoder.layers.3.fc1.weight -> encoder.blocks.3.mlp.0.weight\n",
285
+ "encoder.blocks.3.mlp.0.weight 2 (3072, 768)\n",
286
+ "model.encoder.layers.3.fc1.bias -> encoder.blocks.3.mlp.0.bias\n",
287
+ "encoder.blocks.3.mlp.0.bias 1 (3072,)\n",
288
+ " Converting to float32\n",
289
+ "model.encoder.layers.3.fc2.weight -> encoder.blocks.3.mlp.2.weight\n",
290
+ "encoder.blocks.3.mlp.2.weight 2 (768, 3072)\n",
291
+ "model.encoder.layers.3.fc2.bias -> encoder.blocks.3.mlp.2.bias\n",
292
+ "encoder.blocks.3.mlp.2.bias 1 (768,)\n",
293
+ " Converting to float32\n",
294
+ "model.encoder.layers.3.final_layer_norm.weight -> encoder.blocks.3.mlp_ln.weight\n",
295
+ "encoder.blocks.3.mlp_ln.weight 1 (768,)\n",
296
+ " Converting to float32\n",
297
+ "model.encoder.layers.3.final_layer_norm.bias -> encoder.blocks.3.mlp_ln.bias\n",
298
+ "encoder.blocks.3.mlp_ln.bias 1 (768,)\n",
299
+ " Converting to float32\n",
300
+ "model.encoder.layers.4.self_attn.k_proj.weight -> encoder.blocks.4.attn.key.weight\n",
301
+ "encoder.blocks.4.attn.key.weight 2 (768, 768)\n",
302
+ "model.encoder.layers.4.self_attn.v_proj.weight -> encoder.blocks.4.attn.value.weight\n",
303
+ "encoder.blocks.4.attn.value.weight 2 (768, 768)\n",
304
+ "model.encoder.layers.4.self_attn.v_proj.bias -> encoder.blocks.4.attn.value.bias\n",
305
+ "encoder.blocks.4.attn.value.bias 1 (768,)\n",
306
+ " Converting to float32\n",
307
+ "model.encoder.layers.4.self_attn.q_proj.weight -> encoder.blocks.4.attn.query.weight\n",
308
+ "encoder.blocks.4.attn.query.weight 2 (768, 768)\n",
309
+ "model.encoder.layers.4.self_attn.q_proj.bias -> encoder.blocks.4.attn.query.bias\n",
310
+ "encoder.blocks.4.attn.query.bias 1 (768,)\n",
311
+ " Converting to float32\n",
312
+ "model.encoder.layers.4.self_attn.out_proj.weight -> encoder.blocks.4.attn.out.weight\n",
313
+ "encoder.blocks.4.attn.out.weight 2 (768, 768)\n",
314
+ "model.encoder.layers.4.self_attn.out_proj.bias -> encoder.blocks.4.attn.out.bias\n",
315
+ "encoder.blocks.4.attn.out.bias 1 (768,)\n",
316
+ " Converting to float32\n",
317
+ "model.encoder.layers.4.self_attn_layer_norm.weight -> encoder.blocks.4.attn_ln.weight\n",
318
+ "encoder.blocks.4.attn_ln.weight 1 (768,)\n",
319
+ " Converting to float32\n",
320
+ "model.encoder.layers.4.self_attn_layer_norm.bias -> encoder.blocks.4.attn_ln.bias\n",
321
+ "encoder.blocks.4.attn_ln.bias 1 (768,)\n",
322
+ " Converting to float32\n",
323
+ "model.encoder.layers.4.fc1.weight -> encoder.blocks.4.mlp.0.weight\n",
324
+ "encoder.blocks.4.mlp.0.weight 2 (3072, 768)\n",
325
+ "model.encoder.layers.4.fc1.bias -> encoder.blocks.4.mlp.0.bias\n",
326
+ "encoder.blocks.4.mlp.0.bias 1 (3072,)\n",
327
+ " Converting to float32\n",
328
+ "model.encoder.layers.4.fc2.weight -> encoder.blocks.4.mlp.2.weight\n",
329
+ "encoder.blocks.4.mlp.2.weight 2 (768, 3072)\n",
330
+ "model.encoder.layers.4.fc2.bias -> encoder.blocks.4.mlp.2.bias\n",
331
+ "encoder.blocks.4.mlp.2.bias 1 (768,)\n",
332
+ " Converting to float32\n",
333
+ "model.encoder.layers.4.final_layer_norm.weight -> encoder.blocks.4.mlp_ln.weight\n",
334
+ "encoder.blocks.4.mlp_ln.weight 1 (768,)\n",
335
+ " Converting to float32\n",
336
+ "model.encoder.layers.4.final_layer_norm.bias -> encoder.blocks.4.mlp_ln.bias\n",
337
+ "encoder.blocks.4.mlp_ln.bias 1 (768,)\n",
338
+ " Converting to float32\n",
339
+ "model.encoder.layers.5.self_attn.k_proj.weight -> encoder.blocks.5.attn.key.weight\n",
340
+ "encoder.blocks.5.attn.key.weight 2 (768, 768)\n",
341
+ "model.encoder.layers.5.self_attn.v_proj.weight -> encoder.blocks.5.attn.value.weight\n",
342
+ "encoder.blocks.5.attn.value.weight 2 (768, 768)\n",
343
+ "model.encoder.layers.5.self_attn.v_proj.bias -> encoder.blocks.5.attn.value.bias\n",
344
+ "encoder.blocks.5.attn.value.bias 1 (768,)\n",
345
+ " Converting to float32\n",
346
+ "model.encoder.layers.5.self_attn.q_proj.weight -> encoder.blocks.5.attn.query.weight\n",
347
+ "encoder.blocks.5.attn.query.weight 2 (768, 768)\n",
348
+ "model.encoder.layers.5.self_attn.q_proj.bias -> encoder.blocks.5.attn.query.bias\n",
349
+ "encoder.blocks.5.attn.query.bias 1 (768,)\n",
350
+ " Converting to float32\n",
351
+ "model.encoder.layers.5.self_attn.out_proj.weight -> encoder.blocks.5.attn.out.weight\n",
352
+ "encoder.blocks.5.attn.out.weight 2 (768, 768)\n",
353
+ "model.encoder.layers.5.self_attn.out_proj.bias -> encoder.blocks.5.attn.out.bias\n",
354
+ "encoder.blocks.5.attn.out.bias 1 (768,)\n",
355
+ " Converting to float32\n",
356
+ "model.encoder.layers.5.self_attn_layer_norm.weight -> encoder.blocks.5.attn_ln.weight\n",
357
+ "encoder.blocks.5.attn_ln.weight 1 (768,)\n",
358
+ " Converting to float32\n",
359
+ "model.encoder.layers.5.self_attn_layer_norm.bias -> encoder.blocks.5.attn_ln.bias\n",
360
+ "encoder.blocks.5.attn_ln.bias 1 (768,)\n",
361
+ " Converting to float32\n",
362
+ "model.encoder.layers.5.fc1.weight -> encoder.blocks.5.mlp.0.weight\n",
363
+ "encoder.blocks.5.mlp.0.weight 2 (3072, 768)\n",
364
+ "model.encoder.layers.5.fc1.bias -> encoder.blocks.5.mlp.0.bias\n",
365
+ "encoder.blocks.5.mlp.0.bias 1 (3072,)\n",
366
+ " Converting to float32\n",
367
+ "model.encoder.layers.5.fc2.weight -> encoder.blocks.5.mlp.2.weight\n",
368
+ "encoder.blocks.5.mlp.2.weight 2 (768, 3072)\n",
369
+ "model.encoder.layers.5.fc2.bias -> encoder.blocks.5.mlp.2.bias\n",
370
+ "encoder.blocks.5.mlp.2.bias 1 (768,)\n",
371
+ " Converting to float32\n",
372
+ "model.encoder.layers.5.final_layer_norm.weight -> encoder.blocks.5.mlp_ln.weight\n",
373
+ "encoder.blocks.5.mlp_ln.weight 1 (768,)\n",
374
+ " Converting to float32\n",
375
+ "model.encoder.layers.5.final_layer_norm.bias -> encoder.blocks.5.mlp_ln.bias\n",
376
+ "encoder.blocks.5.mlp_ln.bias 1 (768,)\n",
377
+ " Converting to float32\n",
378
+ "model.encoder.layers.6.self_attn.k_proj.weight -> encoder.blocks.6.attn.key.weight\n",
379
+ "encoder.blocks.6.attn.key.weight 2 (768, 768)\n",
380
+ "model.encoder.layers.6.self_attn.v_proj.weight -> encoder.blocks.6.attn.value.weight\n",
381
+ "encoder.blocks.6.attn.value.weight 2 (768, 768)\n",
382
+ "model.encoder.layers.6.self_attn.v_proj.bias -> encoder.blocks.6.attn.value.bias\n",
383
+ "encoder.blocks.6.attn.value.bias 1 (768,)\n",
384
+ " Converting to float32\n",
385
+ "model.encoder.layers.6.self_attn.q_proj.weight -> encoder.blocks.6.attn.query.weight\n",
386
+ "encoder.blocks.6.attn.query.weight 2 (768, 768)\n",
387
+ "model.encoder.layers.6.self_attn.q_proj.bias -> encoder.blocks.6.attn.query.bias\n",
388
+ "encoder.blocks.6.attn.query.bias 1 (768,)\n",
389
+ " Converting to float32\n",
390
+ "model.encoder.layers.6.self_attn.out_proj.weight -> encoder.blocks.6.attn.out.weight\n",
391
+ "encoder.blocks.6.attn.out.weight 2 (768, 768)\n",
392
+ "model.encoder.layers.6.self_attn.out_proj.bias -> encoder.blocks.6.attn.out.bias\n",
393
+ "encoder.blocks.6.attn.out.bias 1 (768,)\n",
394
+ " Converting to float32\n",
395
+ "model.encoder.layers.6.self_attn_layer_norm.weight -> encoder.blocks.6.attn_ln.weight\n",
396
+ "encoder.blocks.6.attn_ln.weight 1 (768,)\n",
397
+ " Converting to float32\n",
398
+ "model.encoder.layers.6.self_attn_layer_norm.bias -> encoder.blocks.6.attn_ln.bias\n",
399
+ "encoder.blocks.6.attn_ln.bias 1 (768,)\n",
400
+ " Converting to float32\n",
401
+ "model.encoder.layers.6.fc1.weight -> encoder.blocks.6.mlp.0.weight\n",
402
+ "encoder.blocks.6.mlp.0.weight 2 (3072, 768)\n",
403
+ "model.encoder.layers.6.fc1.bias -> encoder.blocks.6.mlp.0.bias\n",
404
+ "encoder.blocks.6.mlp.0.bias 1 (3072,)\n",
405
+ " Converting to float32\n",
406
+ "model.encoder.layers.6.fc2.weight -> encoder.blocks.6.mlp.2.weight\n",
407
+ "encoder.blocks.6.mlp.2.weight 2 (768, 3072)\n",
408
+ "model.encoder.layers.6.fc2.bias -> encoder.blocks.6.mlp.2.bias\n",
409
+ "encoder.blocks.6.mlp.2.bias 1 (768,)\n",
410
+ " Converting to float32\n",
411
+ "model.encoder.layers.6.final_layer_norm.weight -> encoder.blocks.6.mlp_ln.weight\n",
412
+ "encoder.blocks.6.mlp_ln.weight 1 (768,)\n",
413
+ " Converting to float32\n",
414
+ "model.encoder.layers.6.final_layer_norm.bias -> encoder.blocks.6.mlp_ln.bias\n",
415
+ "encoder.blocks.6.mlp_ln.bias 1 (768,)\n",
416
+ " Converting to float32\n",
417
+ "model.encoder.layers.7.self_attn.k_proj.weight -> encoder.blocks.7.attn.key.weight\n",
418
+ "encoder.blocks.7.attn.key.weight 2 (768, 768)\n",
419
+ "model.encoder.layers.7.self_attn.v_proj.weight -> encoder.blocks.7.attn.value.weight\n",
420
+ "encoder.blocks.7.attn.value.weight 2 (768, 768)\n",
421
+ "model.encoder.layers.7.self_attn.v_proj.bias -> encoder.blocks.7.attn.value.bias\n",
422
+ "encoder.blocks.7.attn.value.bias 1 (768,)\n",
423
+ " Converting to float32\n",
424
+ "model.encoder.layers.7.self_attn.q_proj.weight -> encoder.blocks.7.attn.query.weight\n",
425
+ "encoder.blocks.7.attn.query.weight 2 (768, 768)\n",
426
+ "model.encoder.layers.7.self_attn.q_proj.bias -> encoder.blocks.7.attn.query.bias\n",
427
+ "encoder.blocks.7.attn.query.bias 1 (768,)\n",
428
+ " Converting to float32\n",
429
+ "model.encoder.layers.7.self_attn.out_proj.weight -> encoder.blocks.7.attn.out.weight\n",
430
+ "encoder.blocks.7.attn.out.weight 2 (768, 768)\n",
431
+ "model.encoder.layers.7.self_attn.out_proj.bias -> encoder.blocks.7.attn.out.bias\n",
432
+ "encoder.blocks.7.attn.out.bias 1 (768,)\n",
433
+ " Converting to float32\n",
434
+ "model.encoder.layers.7.self_attn_layer_norm.weight -> encoder.blocks.7.attn_ln.weight\n",
435
+ "encoder.blocks.7.attn_ln.weight 1 (768,)\n",
436
+ " Converting to float32\n",
437
+ "model.encoder.layers.7.self_attn_layer_norm.bias -> encoder.blocks.7.attn_ln.bias\n",
438
+ "encoder.blocks.7.attn_ln.bias 1 (768,)\n",
439
+ " Converting to float32\n",
440
+ "model.encoder.layers.7.fc1.weight -> encoder.blocks.7.mlp.0.weight\n",
441
+ "encoder.blocks.7.mlp.0.weight 2 (3072, 768)\n",
442
+ "model.encoder.layers.7.fc1.bias -> encoder.blocks.7.mlp.0.bias\n",
443
+ "encoder.blocks.7.mlp.0.bias 1 (3072,)\n",
444
+ " Converting to float32\n",
445
+ "model.encoder.layers.7.fc2.weight -> encoder.blocks.7.mlp.2.weight\n",
446
+ "encoder.blocks.7.mlp.2.weight 2 (768, 3072)\n",
447
+ "model.encoder.layers.7.fc2.bias -> encoder.blocks.7.mlp.2.bias\n",
448
+ "encoder.blocks.7.mlp.2.bias 1 (768,)\n",
449
+ " Converting to float32\n",
450
+ "model.encoder.layers.7.final_layer_norm.weight -> encoder.blocks.7.mlp_ln.weight\n",
451
+ "encoder.blocks.7.mlp_ln.weight 1 (768,)\n",
452
+ " Converting to float32\n",
453
+ "model.encoder.layers.7.final_layer_norm.bias -> encoder.blocks.7.mlp_ln.bias\n",
454
+ "encoder.blocks.7.mlp_ln.bias 1 (768,)\n",
455
+ " Converting to float32\n",
456
+ "model.encoder.layers.8.self_attn.k_proj.weight -> encoder.blocks.8.attn.key.weight\n",
457
+ "encoder.blocks.8.attn.key.weight 2 (768, 768)\n",
458
+ "model.encoder.layers.8.self_attn.v_proj.weight -> encoder.blocks.8.attn.value.weight\n",
459
+ "encoder.blocks.8.attn.value.weight 2 (768, 768)\n",
460
+ "model.encoder.layers.8.self_attn.v_proj.bias -> encoder.blocks.8.attn.value.bias\n",
461
+ "encoder.blocks.8.attn.value.bias 1 (768,)\n",
462
+ " Converting to float32\n",
463
+ "model.encoder.layers.8.self_attn.q_proj.weight -> encoder.blocks.8.attn.query.weight\n",
464
+ "encoder.blocks.8.attn.query.weight 2 (768, 768)\n",
465
+ "model.encoder.layers.8.self_attn.q_proj.bias -> encoder.blocks.8.attn.query.bias\n",
466
+ "encoder.blocks.8.attn.query.bias 1 (768,)\n",
467
+ " Converting to float32\n",
468
+ "model.encoder.layers.8.self_attn.out_proj.weight -> encoder.blocks.8.attn.out.weight\n",
469
+ "encoder.blocks.8.attn.out.weight 2 (768, 768)\n",
470
+ "model.encoder.layers.8.self_attn.out_proj.bias -> encoder.blocks.8.attn.out.bias\n",
471
+ "encoder.blocks.8.attn.out.bias 1 (768,)\n",
472
+ " Converting to float32\n",
473
+ "model.encoder.layers.8.self_attn_layer_norm.weight -> encoder.blocks.8.attn_ln.weight\n",
474
+ "encoder.blocks.8.attn_ln.weight 1 (768,)\n",
475
+ " Converting to float32\n",
476
+ "model.encoder.layers.8.self_attn_layer_norm.bias -> encoder.blocks.8.attn_ln.bias\n",
477
+ "encoder.blocks.8.attn_ln.bias 1 (768,)\n",
478
+ " Converting to float32\n",
479
+ "model.encoder.layers.8.fc1.weight -> encoder.blocks.8.mlp.0.weight\n",
480
+ "encoder.blocks.8.mlp.0.weight 2 (3072, 768)\n",
481
+ "model.encoder.layers.8.fc1.bias -> encoder.blocks.8.mlp.0.bias\n",
482
+ "encoder.blocks.8.mlp.0.bias 1 (3072,)\n",
483
+ " Converting to float32\n",
484
+ "model.encoder.layers.8.fc2.weight -> encoder.blocks.8.mlp.2.weight\n",
485
+ "encoder.blocks.8.mlp.2.weight 2 (768, 3072)\n",
486
+ "model.encoder.layers.8.fc2.bias -> encoder.blocks.8.mlp.2.bias\n",
487
+ "encoder.blocks.8.mlp.2.bias 1 (768,)\n",
488
+ " Converting to float32\n",
489
+ "model.encoder.layers.8.final_layer_norm.weight -> encoder.blocks.8.mlp_ln.weight\n",
490
+ "encoder.blocks.8.mlp_ln.weight 1 (768,)\n",
491
+ " Converting to float32\n",
492
+ "model.encoder.layers.8.final_layer_norm.bias -> encoder.blocks.8.mlp_ln.bias\n",
493
+ "encoder.blocks.8.mlp_ln.bias 1 (768,)\n",
494
+ " Converting to float32\n",
495
+ "model.encoder.layers.9.self_attn.k_proj.weight -> encoder.blocks.9.attn.key.weight\n",
496
+ "encoder.blocks.9.attn.key.weight 2 (768, 768)\n",
497
+ "model.encoder.layers.9.self_attn.v_proj.weight -> encoder.blocks.9.attn.value.weight\n",
498
+ "encoder.blocks.9.attn.value.weight 2 (768, 768)\n",
499
+ "model.encoder.layers.9.self_attn.v_proj.bias -> encoder.blocks.9.attn.value.bias\n",
500
+ "encoder.blocks.9.attn.value.bias 1 (768,)\n",
501
+ " Converting to float32\n",
502
+ "model.encoder.layers.9.self_attn.q_proj.weight -> encoder.blocks.9.attn.query.weight\n",
503
+ "encoder.blocks.9.attn.query.weight 2 (768, 768)\n",
504
+ "model.encoder.layers.9.self_attn.q_proj.bias -> encoder.blocks.9.attn.query.bias\n",
505
+ "encoder.blocks.9.attn.query.bias 1 (768,)\n",
506
+ " Converting to float32\n",
507
+ "model.encoder.layers.9.self_attn.out_proj.weight -> encoder.blocks.9.attn.out.weight\n",
508
+ "encoder.blocks.9.attn.out.weight 2 (768, 768)\n",
509
+ "model.encoder.layers.9.self_attn.out_proj.bias -> encoder.blocks.9.attn.out.bias\n",
510
+ "encoder.blocks.9.attn.out.bias 1 (768,)\n",
511
+ " Converting to float32\n",
512
+ "model.encoder.layers.9.self_attn_layer_norm.weight -> encoder.blocks.9.attn_ln.weight\n",
513
+ "encoder.blocks.9.attn_ln.weight 1 (768,)\n",
514
+ " Converting to float32\n",
515
+ "model.encoder.layers.9.self_attn_layer_norm.bias -> encoder.blocks.9.attn_ln.bias\n",
516
+ "encoder.blocks.9.attn_ln.bias 1 (768,)\n",
517
+ " Converting to float32\n",
518
+ "model.encoder.layers.9.fc1.weight -> encoder.blocks.9.mlp.0.weight\n",
519
+ "encoder.blocks.9.mlp.0.weight 2 (3072, 768)\n",
520
+ "model.encoder.layers.9.fc1.bias -> encoder.blocks.9.mlp.0.bias\n",
521
+ "encoder.blocks.9.mlp.0.bias 1 (3072,)\n",
522
+ " Converting to float32\n",
523
+ "model.encoder.layers.9.fc2.weight -> encoder.blocks.9.mlp.2.weight\n",
524
+ "encoder.blocks.9.mlp.2.weight 2 (768, 3072)\n",
525
+ "model.encoder.layers.9.fc2.bias -> encoder.blocks.9.mlp.2.bias\n",
526
+ "encoder.blocks.9.mlp.2.bias 1 (768,)\n",
527
+ " Converting to float32\n",
528
+ "model.encoder.layers.9.final_layer_norm.weight -> encoder.blocks.9.mlp_ln.weight\n",
529
+ "encoder.blocks.9.mlp_ln.weight 1 (768,)\n",
530
+ " Converting to float32\n",
531
+ "model.encoder.layers.9.final_layer_norm.bias -> encoder.blocks.9.mlp_ln.bias\n",
532
+ "encoder.blocks.9.mlp_ln.bias 1 (768,)\n",
533
+ " Converting to float32\n",
534
+ "model.encoder.layers.10.self_attn.k_proj.weight -> encoder.blocks.10.attn.key.weight\n",
535
+ "encoder.blocks.10.attn.key.weight 2 (768, 768)\n",
536
+ "model.encoder.layers.10.self_attn.v_proj.weight -> encoder.blocks.10.attn.value.weight\n",
537
+ "encoder.blocks.10.attn.value.weight 2 (768, 768)\n",
538
+ "model.encoder.layers.10.self_attn.v_proj.bias -> encoder.blocks.10.attn.value.bias\n",
539
+ "encoder.blocks.10.attn.value.bias 1 (768,)\n",
540
+ " Converting to float32\n",
541
+ "model.encoder.layers.10.self_attn.q_proj.weight -> encoder.blocks.10.attn.query.weight\n",
542
+ "encoder.blocks.10.attn.query.weight 2 (768, 768)\n",
543
+ "model.encoder.layers.10.self_attn.q_proj.bias -> encoder.blocks.10.attn.query.bias\n",
544
+ "encoder.blocks.10.attn.query.bias 1 (768,)\n",
545
+ " Converting to float32\n",
546
+ "model.encoder.layers.10.self_attn.out_proj.weight -> encoder.blocks.10.attn.out.weight\n",
547
+ "encoder.blocks.10.attn.out.weight 2 (768, 768)\n",
548
+ "model.encoder.layers.10.self_attn.out_proj.bias -> encoder.blocks.10.attn.out.bias\n",
549
+ "encoder.blocks.10.attn.out.bias 1 (768,)\n",
550
+ " Converting to float32\n",
551
+ "model.encoder.layers.10.self_attn_layer_norm.weight -> encoder.blocks.10.attn_ln.weight\n",
552
+ "encoder.blocks.10.attn_ln.weight 1 (768,)\n",
553
+ " Converting to float32\n",
554
+ "model.encoder.layers.10.self_attn_layer_norm.bias -> encoder.blocks.10.attn_ln.bias\n",
555
+ "encoder.blocks.10.attn_ln.bias 1 (768,)\n",
556
+ " Converting to float32\n",
557
+ "model.encoder.layers.10.fc1.weight -> encoder.blocks.10.mlp.0.weight\n",
558
+ "encoder.blocks.10.mlp.0.weight 2 (3072, 768)\n",
559
+ "model.encoder.layers.10.fc1.bias -> encoder.blocks.10.mlp.0.bias\n",
560
+ "encoder.blocks.10.mlp.0.bias 1 (3072,)\n",
561
+ " Converting to float32\n",
562
+ "model.encoder.layers.10.fc2.weight -> encoder.blocks.10.mlp.2.weight\n",
563
+ "encoder.blocks.10.mlp.2.weight 2 (768, 3072)\n",
564
+ "model.encoder.layers.10.fc2.bias -> encoder.blocks.10.mlp.2.bias\n",
565
+ "encoder.blocks.10.mlp.2.bias 1 (768,)\n",
566
+ " Converting to float32\n",
567
+ "model.encoder.layers.10.final_layer_norm.weight -> encoder.blocks.10.mlp_ln.weight\n",
568
+ "encoder.blocks.10.mlp_ln.weight 1 (768,)\n",
569
+ " Converting to float32\n",
570
+ "model.encoder.layers.10.final_layer_norm.bias -> encoder.blocks.10.mlp_ln.bias\n",
571
+ "encoder.blocks.10.mlp_ln.bias 1 (768,)\n",
572
+ " Converting to float32\n",
573
+ "model.encoder.layers.11.self_attn.k_proj.weight -> encoder.blocks.11.attn.key.weight\n",
574
+ "encoder.blocks.11.attn.key.weight 2 (768, 768)\n",
575
+ "model.encoder.layers.11.self_attn.v_proj.weight -> encoder.blocks.11.attn.value.weight\n",
576
+ "encoder.blocks.11.attn.value.weight 2 (768, 768)\n",
577
+ "model.encoder.layers.11.self_attn.v_proj.bias -> encoder.blocks.11.attn.value.bias\n",
578
+ "encoder.blocks.11.attn.value.bias 1 (768,)\n",
579
+ " Converting to float32\n",
580
+ "model.encoder.layers.11.self_attn.q_proj.weight -> encoder.blocks.11.attn.query.weight\n",
581
+ "encoder.blocks.11.attn.query.weight 2 (768, 768)\n",
582
+ "model.encoder.layers.11.self_attn.q_proj.bias -> encoder.blocks.11.attn.query.bias\n",
583
+ "encoder.blocks.11.attn.query.bias 1 (768,)\n",
584
+ " Converting to float32\n",
585
+ "model.encoder.layers.11.self_attn.out_proj.weight -> encoder.blocks.11.attn.out.weight\n",
586
+ "encoder.blocks.11.attn.out.weight 2 (768, 768)\n",
587
+ "model.encoder.layers.11.self_attn.out_proj.bias -> encoder.blocks.11.attn.out.bias\n",
588
+ "encoder.blocks.11.attn.out.bias 1 (768,)\n",
589
+ " Converting to float32\n",
590
+ "model.encoder.layers.11.self_attn_layer_norm.weight -> encoder.blocks.11.attn_ln.weight\n",
591
+ "encoder.blocks.11.attn_ln.weight 1 (768,)\n",
592
+ " Converting to float32\n",
593
+ "model.encoder.layers.11.self_attn_layer_norm.bias -> encoder.blocks.11.attn_ln.bias\n",
594
+ "encoder.blocks.11.attn_ln.bias 1 (768,)\n",
595
+ " Converting to float32\n",
596
+ "model.encoder.layers.11.fc1.weight -> encoder.blocks.11.mlp.0.weight\n",
597
+ "encoder.blocks.11.mlp.0.weight 2 (3072, 768)\n",
598
+ "model.encoder.layers.11.fc1.bias -> encoder.blocks.11.mlp.0.bias\n",
599
+ "encoder.blocks.11.mlp.0.bias 1 (3072,)\n",
600
+ " Converting to float32\n",
601
+ "model.encoder.layers.11.fc2.weight -> encoder.blocks.11.mlp.2.weight\n",
602
+ "encoder.blocks.11.mlp.2.weight 2 (768, 3072)\n",
603
+ "model.encoder.layers.11.fc2.bias -> encoder.blocks.11.mlp.2.bias\n",
604
+ "encoder.blocks.11.mlp.2.bias 1 (768,)\n",
605
+ " Converting to float32\n",
606
+ "model.encoder.layers.11.final_layer_norm.weight -> encoder.blocks.11.mlp_ln.weight\n",
607
+ "encoder.blocks.11.mlp_ln.weight 1 (768,)\n",
608
+ " Converting to float32\n",
609
+ "model.encoder.layers.11.final_layer_norm.bias -> encoder.blocks.11.mlp_ln.bias\n",
610
+ "encoder.blocks.11.mlp_ln.bias 1 (768,)\n",
611
+ " Converting to float32\n",
612
+ "model.encoder.layer_norm.weight -> encoder.ln_post.weight\n",
613
+ "encoder.ln_post.weight 1 (768,)\n",
614
+ " Converting to float32\n",
615
+ "model.encoder.layer_norm.bias -> encoder.ln_post.bias\n",
616
+ "encoder.ln_post.bias 1 (768,)\n",
617
+ " Converting to float32\n",
618
+ "model.decoder.embed_tokens.weight -> decoder.token_embedding.weight\n",
619
+ "decoder.token_embedding.weight 2 (51865, 768)\n",
620
+ "model.decoder.embed_positions.weight -> decoder.positional_embedding\n",
621
+ "decoder.positional_embedding 2 (448, 768)\n",
622
+ " Converting to float32\n",
623
+ "model.decoder.layers.0.self_attn.k_proj.weight -> decoder.blocks.0.attn.key.weight\n",
624
+ "decoder.blocks.0.attn.key.weight 2 (768, 768)\n",
625
+ "model.decoder.layers.0.self_attn.v_proj.weight -> decoder.blocks.0.attn.value.weight\n",
626
+ "decoder.blocks.0.attn.value.weight 2 (768, 768)\n",
627
+ "model.decoder.layers.0.self_attn.v_proj.bias -> decoder.blocks.0.attn.value.bias\n",
628
+ "decoder.blocks.0.attn.value.bias 1 (768,)\n",
629
+ " Converting to float32\n",
630
+ "model.decoder.layers.0.self_attn.q_proj.weight -> decoder.blocks.0.attn.query.weight\n",
631
+ "decoder.blocks.0.attn.query.weight 2 (768, 768)\n",
632
+ "model.decoder.layers.0.self_attn.q_proj.bias -> decoder.blocks.0.attn.query.bias\n",
633
+ "decoder.blocks.0.attn.query.bias 1 (768,)\n",
634
+ " Converting to float32\n",
635
+ "model.decoder.layers.0.self_attn.out_proj.weight -> decoder.blocks.0.attn.out.weight\n",
636
+ "decoder.blocks.0.attn.out.weight 2 (768, 768)\n",
637
+ "model.decoder.layers.0.self_attn.out_proj.bias -> decoder.blocks.0.attn.out.bias\n",
638
+ "decoder.blocks.0.attn.out.bias 1 (768,)\n",
639
+ " Converting to float32\n",
640
+ "model.decoder.layers.0.self_attn_layer_norm.weight -> decoder.blocks.0.attn_ln.weight\n",
641
+ "decoder.blocks.0.attn_ln.weight 1 (768,)\n",
642
+ " Converting to float32\n",
643
+ "model.decoder.layers.0.self_attn_layer_norm.bias -> decoder.blocks.0.attn_ln.bias\n",
644
+ "decoder.blocks.0.attn_ln.bias 1 (768,)\n",
645
+ " Converting to float32\n",
646
+ "model.decoder.layers.0.encoder_attn.k_proj.weight -> decoder.blocks.0.cross_attn.key.weight\n",
647
+ "decoder.blocks.0.cross_attn.key.weight 2 (768, 768)\n",
648
+ "model.decoder.layers.0.encoder_attn.v_proj.weight -> decoder.blocks.0.cross_attn.value.weight\n",
649
+ "decoder.blocks.0.cross_attn.value.weight 2 (768, 768)\n",
650
+ "model.decoder.layers.0.encoder_attn.v_proj.bias -> decoder.blocks.0.cross_attn.value.bias\n",
651
+ "decoder.blocks.0.cross_attn.value.bias 1 (768,)\n",
652
+ " Converting to float32\n",
653
+ "model.decoder.layers.0.encoder_attn.q_proj.weight -> decoder.blocks.0.cross_attn.query.weight\n",
654
+ "decoder.blocks.0.cross_attn.query.weight 2 (768, 768)\n",
655
+ "model.decoder.layers.0.encoder_attn.q_proj.bias -> decoder.blocks.0.cross_attn.query.bias\n",
656
+ "decoder.blocks.0.cross_attn.query.bias 1 (768,)\n",
657
+ " Converting to float32\n",
658
+ "model.decoder.layers.0.encoder_attn.out_proj.weight -> decoder.blocks.0.cross_attn.out.weight\n",
659
+ "decoder.blocks.0.cross_attn.out.weight 2 (768, 768)\n",
660
+ "model.decoder.layers.0.encoder_attn.out_proj.bias -> decoder.blocks.0.cross_attn.out.bias\n",
661
+ "decoder.blocks.0.cross_attn.out.bias 1 (768,)\n",
662
+ " Converting to float32\n",
663
+ "model.decoder.layers.0.encoder_attn_layer_norm.weight -> decoder.blocks.0.cross_attn_ln.weight\n",
664
+ "decoder.blocks.0.cross_attn_ln.weight 1 (768,)\n",
665
+ " Converting to float32\n",
666
+ "model.decoder.layers.0.encoder_attn_layer_norm.bias -> decoder.blocks.0.cross_attn_ln.bias\n",
667
+ "decoder.blocks.0.cross_attn_ln.bias 1 (768,)\n",
668
+ " Converting to float32\n",
669
+ "model.decoder.layers.0.fc1.weight -> decoder.blocks.0.mlp.0.weight\n",
670
+ "decoder.blocks.0.mlp.0.weight 2 (3072, 768)\n",
671
+ "model.decoder.layers.0.fc1.bias -> decoder.blocks.0.mlp.0.bias\n",
672
+ "decoder.blocks.0.mlp.0.bias 1 (3072,)\n",
673
+ " Converting to float32\n",
674
+ "model.decoder.layers.0.fc2.weight -> decoder.blocks.0.mlp.2.weight\n",
675
+ "decoder.blocks.0.mlp.2.weight 2 (768, 3072)\n",
676
+ "model.decoder.layers.0.fc2.bias -> decoder.blocks.0.mlp.2.bias\n",
677
+ "decoder.blocks.0.mlp.2.bias 1 (768,)\n",
678
+ " Converting to float32\n",
679
+ "model.decoder.layers.0.final_layer_norm.weight -> decoder.blocks.0.mlp_ln.weight\n",
680
+ "decoder.blocks.0.mlp_ln.weight 1 (768,)\n",
681
+ " Converting to float32\n",
682
+ "model.decoder.layers.0.final_layer_norm.bias -> decoder.blocks.0.mlp_ln.bias\n",
683
+ "decoder.blocks.0.mlp_ln.bias 1 (768,)\n",
684
+ " Converting to float32\n",
685
+ "model.decoder.layers.1.self_attn.k_proj.weight -> decoder.blocks.1.attn.key.weight\n",
686
+ "decoder.blocks.1.attn.key.weight 2 (768, 768)\n",
687
+ "model.decoder.layers.1.self_attn.v_proj.weight -> decoder.blocks.1.attn.value.weight\n",
688
+ "decoder.blocks.1.attn.value.weight 2 (768, 768)\n",
689
+ "model.decoder.layers.1.self_attn.v_proj.bias -> decoder.blocks.1.attn.value.bias\n",
690
+ "decoder.blocks.1.attn.value.bias 1 (768,)\n",
691
+ " Converting to float32\n",
692
+ "model.decoder.layers.1.self_attn.q_proj.weight -> decoder.blocks.1.attn.query.weight\n",
693
+ "decoder.blocks.1.attn.query.weight 2 (768, 768)\n",
694
+ "model.decoder.layers.1.self_attn.q_proj.bias -> decoder.blocks.1.attn.query.bias\n",
695
+ "decoder.blocks.1.attn.query.bias 1 (768,)\n",
696
+ " Converting to float32\n",
697
+ "model.decoder.layers.1.self_attn.out_proj.weight -> decoder.blocks.1.attn.out.weight\n",
698
+ "decoder.blocks.1.attn.out.weight 2 (768, 768)\n",
699
+ "model.decoder.layers.1.self_attn.out_proj.bias -> decoder.blocks.1.attn.out.bias\n",
700
+ "decoder.blocks.1.attn.out.bias 1 (768,)\n",
701
+ " Converting to float32\n",
702
+ "model.decoder.layers.1.self_attn_layer_norm.weight -> decoder.blocks.1.attn_ln.weight\n",
703
+ "decoder.blocks.1.attn_ln.weight 1 (768,)\n",
704
+ " Converting to float32\n",
705
+ "model.decoder.layers.1.self_attn_layer_norm.bias -> decoder.blocks.1.attn_ln.bias\n",
706
+ "decoder.blocks.1.attn_ln.bias 1 (768,)\n",
707
+ " Converting to float32\n",
708
+ "model.decoder.layers.1.encoder_attn.k_proj.weight -> decoder.blocks.1.cross_attn.key.weight\n",
709
+ "decoder.blocks.1.cross_attn.key.weight 2 (768, 768)\n",
710
+ "model.decoder.layers.1.encoder_attn.v_proj.weight -> decoder.blocks.1.cross_attn.value.weight\n",
711
+ "decoder.blocks.1.cross_attn.value.weight 2 (768, 768)\n",
712
+ "model.decoder.layers.1.encoder_attn.v_proj.bias -> decoder.blocks.1.cross_attn.value.bias\n",
713
+ "decoder.blocks.1.cross_attn.value.bias 1 (768,)\n",
714
+ " Converting to float32\n",
715
+ "model.decoder.layers.1.encoder_attn.q_proj.weight -> decoder.blocks.1.cross_attn.query.weight\n",
716
+ "decoder.blocks.1.cross_attn.query.weight 2 (768, 768)\n",
717
+ "model.decoder.layers.1.encoder_attn.q_proj.bias -> decoder.blocks.1.cross_attn.query.bias\n",
718
+ "decoder.blocks.1.cross_attn.query.bias 1 (768,)\n",
719
+ " Converting to float32\n",
720
+ "model.decoder.layers.1.encoder_attn.out_proj.weight -> decoder.blocks.1.cross_attn.out.weight\n",
721
+ "decoder.blocks.1.cross_attn.out.weight 2 (768, 768)\n",
722
+ "model.decoder.layers.1.encoder_attn.out_proj.bias -> decoder.blocks.1.cross_attn.out.bias\n",
723
+ "decoder.blocks.1.cross_attn.out.bias 1 (768,)\n",
724
+ " Converting to float32\n",
725
+ "model.decoder.layers.1.encoder_attn_layer_norm.weight -> decoder.blocks.1.cross_attn_ln.weight\n",
726
+ "decoder.blocks.1.cross_attn_ln.weight 1 (768,)\n",
727
+ " Converting to float32\n",
728
+ "model.decoder.layers.1.encoder_attn_layer_norm.bias -> decoder.blocks.1.cross_attn_ln.bias\n",
729
+ "decoder.blocks.1.cross_attn_ln.bias 1 (768,)\n",
730
+ " Converting to float32\n",
731
+ "model.decoder.layers.1.fc1.weight -> decoder.blocks.1.mlp.0.weight\n",
732
+ "decoder.blocks.1.mlp.0.weight 2 (3072, 768)\n",
733
+ "model.decoder.layers.1.fc1.bias -> decoder.blocks.1.mlp.0.bias\n",
734
+ "decoder.blocks.1.mlp.0.bias 1 (3072,)\n",
735
+ " Converting to float32\n",
736
+ "model.decoder.layers.1.fc2.weight -> decoder.blocks.1.mlp.2.weight\n",
737
+ "decoder.blocks.1.mlp.2.weight 2 (768, 3072)\n",
738
+ "model.decoder.layers.1.fc2.bias -> decoder.blocks.1.mlp.2.bias\n",
739
+ "decoder.blocks.1.mlp.2.bias 1 (768,)\n",
740
+ " Converting to float32\n",
741
+ "model.decoder.layers.1.final_layer_norm.weight -> decoder.blocks.1.mlp_ln.weight\n",
742
+ "decoder.blocks.1.mlp_ln.weight 1 (768,)\n",
743
+ " Converting to float32\n",
744
+ "model.decoder.layers.1.final_layer_norm.bias -> decoder.blocks.1.mlp_ln.bias\n",
745
+ "decoder.blocks.1.mlp_ln.bias 1 (768,)\n",
746
+ " Converting to float32\n",
747
+ "model.decoder.layers.2.self_attn.k_proj.weight -> decoder.blocks.2.attn.key.weight\n",
748
+ "decoder.blocks.2.attn.key.weight 2 (768, 768)\n",
749
+ "model.decoder.layers.2.self_attn.v_proj.weight -> decoder.blocks.2.attn.value.weight\n",
750
+ "decoder.blocks.2.attn.value.weight 2 (768, 768)\n",
751
+ "model.decoder.layers.2.self_attn.v_proj.bias -> decoder.blocks.2.attn.value.bias\n",
752
+ "decoder.blocks.2.attn.value.bias 1 (768,)\n",
753
+ " Converting to float32\n",
754
+ "model.decoder.layers.2.self_attn.q_proj.weight -> decoder.blocks.2.attn.query.weight\n",
755
+ "decoder.blocks.2.attn.query.weight 2 (768, 768)\n",
756
+ "model.decoder.layers.2.self_attn.q_proj.bias -> decoder.blocks.2.attn.query.bias\n",
757
+ "decoder.blocks.2.attn.query.bias 1 (768,)\n",
758
+ " Converting to float32\n",
759
+ "model.decoder.layers.2.self_attn.out_proj.weight -> decoder.blocks.2.attn.out.weight\n",
760
+ "decoder.blocks.2.attn.out.weight 2 (768, 768)\n",
761
+ "model.decoder.layers.2.self_attn.out_proj.bias -> decoder.blocks.2.attn.out.bias\n",
762
+ "decoder.blocks.2.attn.out.bias 1 (768,)\n",
763
+ " Converting to float32\n",
764
+ "model.decoder.layers.2.self_attn_layer_norm.weight -> decoder.blocks.2.attn_ln.weight\n",
765
+ "decoder.blocks.2.attn_ln.weight 1 (768,)\n",
766
+ " Converting to float32\n",
767
+ "model.decoder.layers.2.self_attn_layer_norm.bias -> decoder.blocks.2.attn_ln.bias\n",
768
+ "decoder.blocks.2.attn_ln.bias 1 (768,)\n",
769
+ " Converting to float32\n",
770
+ "model.decoder.layers.2.encoder_attn.k_proj.weight -> decoder.blocks.2.cross_attn.key.weight\n",
771
+ "decoder.blocks.2.cross_attn.key.weight 2 (768, 768)\n",
772
+ "model.decoder.layers.2.encoder_attn.v_proj.weight -> decoder.blocks.2.cross_attn.value.weight\n",
773
+ "decoder.blocks.2.cross_attn.value.weight 2 (768, 768)\n",
774
+ "model.decoder.layers.2.encoder_attn.v_proj.bias -> decoder.blocks.2.cross_attn.value.bias\n",
775
+ "decoder.blocks.2.cross_attn.value.bias 1 (768,)\n",
776
+ " Converting to float32\n",
777
+ "model.decoder.layers.2.encoder_attn.q_proj.weight -> decoder.blocks.2.cross_attn.query.weight\n",
778
+ "decoder.blocks.2.cross_attn.query.weight 2 (768, 768)\n",
779
+ "model.decoder.layers.2.encoder_attn.q_proj.bias -> decoder.blocks.2.cross_attn.query.bias\n",
780
+ "decoder.blocks.2.cross_attn.query.bias 1 (768,)\n",
781
+ " Converting to float32\n",
782
+ "model.decoder.layers.2.encoder_attn.out_proj.weight -> decoder.blocks.2.cross_attn.out.weight\n",
783
+ "decoder.blocks.2.cross_attn.out.weight 2 (768, 768)\n",
784
+ "model.decoder.layers.2.encoder_attn.out_proj.bias -> decoder.blocks.2.cross_attn.out.bias\n",
785
+ "decoder.blocks.2.cross_attn.out.bias 1 (768,)\n",
786
+ " Converting to float32\n",
787
+ "model.decoder.layers.2.encoder_attn_layer_norm.weight -> decoder.blocks.2.cross_attn_ln.weight\n",
788
+ "decoder.blocks.2.cross_attn_ln.weight 1 (768,)\n",
789
+ " Converting to float32\n",
790
+ "model.decoder.layers.2.encoder_attn_layer_norm.bias -> decoder.blocks.2.cross_attn_ln.bias\n",
791
+ "decoder.blocks.2.cross_attn_ln.bias 1 (768,)\n",
792
+ " Converting to float32\n",
793
+ "model.decoder.layers.2.fc1.weight -> decoder.blocks.2.mlp.0.weight\n",
794
+ "decoder.blocks.2.mlp.0.weight 2 (3072, 768)\n",
795
+ "model.decoder.layers.2.fc1.bias -> decoder.blocks.2.mlp.0.bias\n",
796
+ "decoder.blocks.2.mlp.0.bias 1 (3072,)\n",
797
+ " Converting to float32\n",
798
+ "model.decoder.layers.2.fc2.weight -> decoder.blocks.2.mlp.2.weight\n",
799
+ "decoder.blocks.2.mlp.2.weight 2 (768, 3072)\n",
800
+ "model.decoder.layers.2.fc2.bias -> decoder.blocks.2.mlp.2.bias\n",
801
+ "decoder.blocks.2.mlp.2.bias 1 (768,)\n",
802
+ " Converting to float32\n",
803
+ "model.decoder.layers.2.final_layer_norm.weight -> decoder.blocks.2.mlp_ln.weight\n",
804
+ "decoder.blocks.2.mlp_ln.weight 1 (768,)\n",
805
+ " Converting to float32\n",
806
+ "model.decoder.layers.2.final_layer_norm.bias -> decoder.blocks.2.mlp_ln.bias\n",
807
+ "decoder.blocks.2.mlp_ln.bias 1 (768,)\n",
808
+ " Converting to float32\n",
809
+ "model.decoder.layers.3.self_attn.k_proj.weight -> decoder.blocks.3.attn.key.weight\n",
810
+ "decoder.blocks.3.attn.key.weight 2 (768, 768)\n",
811
+ "model.decoder.layers.3.self_attn.v_proj.weight -> decoder.blocks.3.attn.value.weight\n",
812
+ "decoder.blocks.3.attn.value.weight 2 (768, 768)\n",
813
+ "model.decoder.layers.3.self_attn.v_proj.bias -> decoder.blocks.3.attn.value.bias\n",
814
+ "decoder.blocks.3.attn.value.bias 1 (768,)\n",
815
+ " Converting to float32\n",
816
+ "model.decoder.layers.3.self_attn.q_proj.weight -> decoder.blocks.3.attn.query.weight\n",
817
+ "decoder.blocks.3.attn.query.weight 2 (768, 768)\n",
818
+ "model.decoder.layers.3.self_attn.q_proj.bias -> decoder.blocks.3.attn.query.bias\n",
819
+ "decoder.blocks.3.attn.query.bias 1 (768,)\n",
820
+ " Converting to float32\n",
821
+ "model.decoder.layers.3.self_attn.out_proj.weight -> decoder.blocks.3.attn.out.weight\n",
822
+ "decoder.blocks.3.attn.out.weight 2 (768, 768)\n",
823
+ "model.decoder.layers.3.self_attn.out_proj.bias -> decoder.blocks.3.attn.out.bias\n",
824
+ "decoder.blocks.3.attn.out.bias 1 (768,)\n",
825
+ " Converting to float32\n",
826
+ "model.decoder.layers.3.self_attn_layer_norm.weight -> decoder.blocks.3.attn_ln.weight\n",
827
+ "decoder.blocks.3.attn_ln.weight 1 (768,)\n",
828
+ " Converting to float32\n",
829
+ "model.decoder.layers.3.self_attn_layer_norm.bias -> decoder.blocks.3.attn_ln.bias\n",
830
+ "decoder.blocks.3.attn_ln.bias 1 (768,)\n",
831
+ " Converting to float32\n",
832
+ "model.decoder.layers.3.encoder_attn.k_proj.weight -> decoder.blocks.3.cross_attn.key.weight\n",
833
+ "decoder.blocks.3.cross_attn.key.weight 2 (768, 768)\n",
834
+ "model.decoder.layers.3.encoder_attn.v_proj.weight -> decoder.blocks.3.cross_attn.value.weight\n",
835
+ "decoder.blocks.3.cross_attn.value.weight 2 (768, 768)\n",
836
+ "model.decoder.layers.3.encoder_attn.v_proj.bias -> decoder.blocks.3.cross_attn.value.bias\n",
837
+ "decoder.blocks.3.cross_attn.value.bias 1 (768,)\n",
838
+ " Converting to float32\n",
839
+ "model.decoder.layers.3.encoder_attn.q_proj.weight -> decoder.blocks.3.cross_attn.query.weight\n",
840
+ "decoder.blocks.3.cross_attn.query.weight 2 (768, 768)\n",
841
+ "model.decoder.layers.3.encoder_attn.q_proj.bias -> decoder.blocks.3.cross_attn.query.bias\n",
842
+ "decoder.blocks.3.cross_attn.query.bias 1 (768,)\n",
843
+ " Converting to float32\n",
844
+ "model.decoder.layers.3.encoder_attn.out_proj.weight -> decoder.blocks.3.cross_attn.out.weight\n",
845
+ "decoder.blocks.3.cross_attn.out.weight 2 (768, 768)\n",
846
+ "model.decoder.layers.3.encoder_attn.out_proj.bias -> decoder.blocks.3.cross_attn.out.bias\n",
847
+ "decoder.blocks.3.cross_attn.out.bias 1 (768,)\n",
848
+ " Converting to float32\n",
849
+ "model.decoder.layers.3.encoder_attn_layer_norm.weight -> decoder.blocks.3.cross_attn_ln.weight\n",
850
+ "decoder.blocks.3.cross_attn_ln.weight 1 (768,)\n",
851
+ " Converting to float32\n",
852
+ "model.decoder.layers.3.encoder_attn_layer_norm.bias -> decoder.blocks.3.cross_attn_ln.bias\n",
853
+ "decoder.blocks.3.cross_attn_ln.bias 1 (768,)\n",
854
+ " Converting to float32\n",
855
+ "model.decoder.layers.3.fc1.weight -> decoder.blocks.3.mlp.0.weight\n",
856
+ "decoder.blocks.3.mlp.0.weight 2 (3072, 768)\n",
857
+ "model.decoder.layers.3.fc1.bias -> decoder.blocks.3.mlp.0.bias\n",
858
+ "decoder.blocks.3.mlp.0.bias 1 (3072,)\n",
859
+ " Converting to float32\n",
860
+ "model.decoder.layers.3.fc2.weight -> decoder.blocks.3.mlp.2.weight\n",
861
+ "decoder.blocks.3.mlp.2.weight 2 (768, 3072)\n",
862
+ "model.decoder.layers.3.fc2.bias -> decoder.blocks.3.mlp.2.bias\n",
863
+ "decoder.blocks.3.mlp.2.bias 1 (768,)\n",
864
+ " Converting to float32\n",
865
+ "model.decoder.layers.3.final_layer_norm.weight -> decoder.blocks.3.mlp_ln.weight\n",
866
+ "decoder.blocks.3.mlp_ln.weight 1 (768,)\n",
867
+ " Converting to float32\n",
868
+ "model.decoder.layers.3.final_layer_norm.bias -> decoder.blocks.3.mlp_ln.bias\n",
869
+ "decoder.blocks.3.mlp_ln.bias 1 (768,)\n",
870
+ " Converting to float32\n",
871
+ "model.decoder.layers.4.self_attn.k_proj.weight -> decoder.blocks.4.attn.key.weight\n",
872
+ "decoder.blocks.4.attn.key.weight 2 (768, 768)\n",
873
+ "model.decoder.layers.4.self_attn.v_proj.weight -> decoder.blocks.4.attn.value.weight\n",
874
+ "decoder.blocks.4.attn.value.weight 2 (768, 768)\n",
875
+ "model.decoder.layers.4.self_attn.v_proj.bias -> decoder.blocks.4.attn.value.bias\n",
876
+ "decoder.blocks.4.attn.value.bias 1 (768,)\n",
877
+ " Converting to float32\n",
878
+ "model.decoder.layers.4.self_attn.q_proj.weight -> decoder.blocks.4.attn.query.weight\n",
879
+ "decoder.blocks.4.attn.query.weight 2 (768, 768)\n",
880
+ "model.decoder.layers.4.self_attn.q_proj.bias -> decoder.blocks.4.attn.query.bias\n",
881
+ "decoder.blocks.4.attn.query.bias 1 (768,)\n",
882
+ " Converting to float32\n",
883
+ "model.decoder.layers.4.self_attn.out_proj.weight -> decoder.blocks.4.attn.out.weight\n",
884
+ "decoder.blocks.4.attn.out.weight 2 (768, 768)\n",
885
+ "model.decoder.layers.4.self_attn.out_proj.bias -> decoder.blocks.4.attn.out.bias\n",
886
+ "decoder.blocks.4.attn.out.bias 1 (768,)\n",
887
+ " Converting to float32\n",
888
+ "model.decoder.layers.4.self_attn_layer_norm.weight -> decoder.blocks.4.attn_ln.weight\n",
889
+ "decoder.blocks.4.attn_ln.weight 1 (768,)\n",
890
+ " Converting to float32\n",
891
+ "model.decoder.layers.4.self_attn_layer_norm.bias -> decoder.blocks.4.attn_ln.bias\n",
892
+ "decoder.blocks.4.attn_ln.bias 1 (768,)\n",
893
+ " Converting to float32\n",
894
+ "model.decoder.layers.4.encoder_attn.k_proj.weight -> decoder.blocks.4.cross_attn.key.weight\n",
895
+ "decoder.blocks.4.cross_attn.key.weight 2 (768, 768)\n",
896
+ "model.decoder.layers.4.encoder_attn.v_proj.weight -> decoder.blocks.4.cross_attn.value.weight\n",
897
+ "decoder.blocks.4.cross_attn.value.weight 2 (768, 768)\n",
898
+ "model.decoder.layers.4.encoder_attn.v_proj.bias -> decoder.blocks.4.cross_attn.value.bias\n",
899
+ "decoder.blocks.4.cross_attn.value.bias 1 (768,)\n",
900
+ " Converting to float32\n",
901
+ "model.decoder.layers.4.encoder_attn.q_proj.weight -> decoder.blocks.4.cross_attn.query.weight\n",
902
+ "decoder.blocks.4.cross_attn.query.weight 2 (768, 768)\n",
903
+ "model.decoder.layers.4.encoder_attn.q_proj.bias -> decoder.blocks.4.cross_attn.query.bias\n",
904
+ "decoder.blocks.4.cross_attn.query.bias 1 (768,)\n",
905
+ " Converting to float32\n",
906
+ "model.decoder.layers.4.encoder_attn.out_proj.weight -> decoder.blocks.4.cross_attn.out.weight\n",
907
+ "decoder.blocks.4.cross_attn.out.weight 2 (768, 768)\n",
908
+ "model.decoder.layers.4.encoder_attn.out_proj.bias -> decoder.blocks.4.cross_attn.out.bias\n",
909
+ "decoder.blocks.4.cross_attn.out.bias 1 (768,)\n",
910
+ " Converting to float32\n",
911
+ "model.decoder.layers.4.encoder_attn_layer_norm.weight -> decoder.blocks.4.cross_attn_ln.weight\n",
912
+ "decoder.blocks.4.cross_attn_ln.weight 1 (768,)\n",
913
+ " Converting to float32\n",
914
+ "model.decoder.layers.4.encoder_attn_layer_norm.bias -> decoder.blocks.4.cross_attn_ln.bias\n",
915
+ "decoder.blocks.4.cross_attn_ln.bias 1 (768,)\n",
916
+ " Converting to float32\n",
917
+ "model.decoder.layers.4.fc1.weight -> decoder.blocks.4.mlp.0.weight\n",
918
+ "decoder.blocks.4.mlp.0.weight 2 (3072, 768)\n",
919
+ "model.decoder.layers.4.fc1.bias -> decoder.blocks.4.mlp.0.bias\n",
920
+ "decoder.blocks.4.mlp.0.bias 1 (3072,)\n",
921
+ " Converting to float32\n",
922
+ "model.decoder.layers.4.fc2.weight -> decoder.blocks.4.mlp.2.weight\n",
923
+ "decoder.blocks.4.mlp.2.weight 2 (768, 3072)\n",
924
+ "model.decoder.layers.4.fc2.bias -> decoder.blocks.4.mlp.2.bias\n",
925
+ "decoder.blocks.4.mlp.2.bias 1 (768,)\n",
926
+ " Converting to float32\n",
927
+ "model.decoder.layers.4.final_layer_norm.weight -> decoder.blocks.4.mlp_ln.weight\n",
928
+ "decoder.blocks.4.mlp_ln.weight 1 (768,)\n",
929
+ " Converting to float32\n",
930
+ "model.decoder.layers.4.final_layer_norm.bias -> decoder.blocks.4.mlp_ln.bias\n",
931
+ "decoder.blocks.4.mlp_ln.bias 1 (768,)\n",
932
+ " Converting to float32\n",
933
+ "model.decoder.layers.5.self_attn.k_proj.weight -> decoder.blocks.5.attn.key.weight\n",
934
+ "decoder.blocks.5.attn.key.weight 2 (768, 768)\n",
935
+ "model.decoder.layers.5.self_attn.v_proj.weight -> decoder.blocks.5.attn.value.weight\n",
936
+ "decoder.blocks.5.attn.value.weight 2 (768, 768)\n",
937
+ "model.decoder.layers.5.self_attn.v_proj.bias -> decoder.blocks.5.attn.value.bias\n",
938
+ "decoder.blocks.5.attn.value.bias 1 (768,)\n",
939
+ " Converting to float32\n",
940
+ "model.decoder.layers.5.self_attn.q_proj.weight -> decoder.blocks.5.attn.query.weight\n",
941
+ "decoder.blocks.5.attn.query.weight 2 (768, 768)\n",
942
+ "model.decoder.layers.5.self_attn.q_proj.bias -> decoder.blocks.5.attn.query.bias\n",
943
+ "decoder.blocks.5.attn.query.bias 1 (768,)\n",
944
+ " Converting to float32\n",
945
+ "model.decoder.layers.5.self_attn.out_proj.weight -> decoder.blocks.5.attn.out.weight\n",
946
+ "decoder.blocks.5.attn.out.weight 2 (768, 768)\n",
947
+ "model.decoder.layers.5.self_attn.out_proj.bias -> decoder.blocks.5.attn.out.bias\n",
948
+ "decoder.blocks.5.attn.out.bias 1 (768,)\n",
949
+ " Converting to float32\n",
950
+ "model.decoder.layers.5.self_attn_layer_norm.weight -> decoder.blocks.5.attn_ln.weight\n",
951
+ "decoder.blocks.5.attn_ln.weight 1 (768,)\n",
952
+ " Converting to float32\n",
953
+ "model.decoder.layers.5.self_attn_layer_norm.bias -> decoder.blocks.5.attn_ln.bias\n",
954
+ "decoder.blocks.5.attn_ln.bias 1 (768,)\n",
955
+ " Converting to float32\n",
956
+ "model.decoder.layers.5.encoder_attn.k_proj.weight -> decoder.blocks.5.cross_attn.key.weight\n",
957
+ "decoder.blocks.5.cross_attn.key.weight 2 (768, 768)\n",
958
+ "model.decoder.layers.5.encoder_attn.v_proj.weight -> decoder.blocks.5.cross_attn.value.weight\n",
959
+ "decoder.blocks.5.cross_attn.value.weight 2 (768, 768)\n",
960
+ "model.decoder.layers.5.encoder_attn.v_proj.bias -> decoder.blocks.5.cross_attn.value.bias\n",
961
+ "decoder.blocks.5.cross_attn.value.bias 1 (768,)\n",
962
+ " Converting to float32\n",
963
+ "model.decoder.layers.5.encoder_attn.q_proj.weight -> decoder.blocks.5.cross_attn.query.weight\n",
964
+ "decoder.blocks.5.cross_attn.query.weight 2 (768, 768)\n",
965
+ "model.decoder.layers.5.encoder_attn.q_proj.bias -> decoder.blocks.5.cross_attn.query.bias\n",
966
+ "decoder.blocks.5.cross_attn.query.bias 1 (768,)\n",
967
+ " Converting to float32\n",
968
+ "model.decoder.layers.5.encoder_attn.out_proj.weight -> decoder.blocks.5.cross_attn.out.weight\n",
969
+ "decoder.blocks.5.cross_attn.out.weight 2 (768, 768)\n",
970
+ "model.decoder.layers.5.encoder_attn.out_proj.bias -> decoder.blocks.5.cross_attn.out.bias\n",
971
+ "decoder.blocks.5.cross_attn.out.bias 1 (768,)\n",
972
+ " Converting to float32\n",
973
+ "model.decoder.layers.5.encoder_attn_layer_norm.weight -> decoder.blocks.5.cross_attn_ln.weight\n",
974
+ "decoder.blocks.5.cross_attn_ln.weight 1 (768,)\n",
975
+ " Converting to float32\n",
976
+ "model.decoder.layers.5.encoder_attn_layer_norm.bias -> decoder.blocks.5.cross_attn_ln.bias\n",
977
+ "decoder.blocks.5.cross_attn_ln.bias 1 (768,)\n",
978
+ " Converting to float32\n",
979
+ "model.decoder.layers.5.fc1.weight -> decoder.blocks.5.mlp.0.weight\n",
980
+ "decoder.blocks.5.mlp.0.weight 2 (3072, 768)\n",
981
+ "model.decoder.layers.5.fc1.bias -> decoder.blocks.5.mlp.0.bias\n",
982
+ "decoder.blocks.5.mlp.0.bias 1 (3072,)\n",
983
+ " Converting to float32\n",
984
+ "model.decoder.layers.5.fc2.weight -> decoder.blocks.5.mlp.2.weight\n",
985
+ "decoder.blocks.5.mlp.2.weight 2 (768, 3072)\n",
986
+ "model.decoder.layers.5.fc2.bias -> decoder.blocks.5.mlp.2.bias\n",
987
+ "decoder.blocks.5.mlp.2.bias 1 (768,)\n",
988
+ " Converting to float32\n",
989
+ "model.decoder.layers.5.final_layer_norm.weight -> decoder.blocks.5.mlp_ln.weight\n",
990
+ "decoder.blocks.5.mlp_ln.weight 1 (768,)\n",
991
+ " Converting to float32\n",
992
+ "model.decoder.layers.5.final_layer_norm.bias -> decoder.blocks.5.mlp_ln.bias\n",
993
+ "decoder.blocks.5.mlp_ln.bias 1 (768,)\n",
994
+ " Converting to float32\n",
995
+ "model.decoder.layers.6.self_attn.k_proj.weight -> decoder.blocks.6.attn.key.weight\n",
996
+ "decoder.blocks.6.attn.key.weight 2 (768, 768)\n",
997
+ "model.decoder.layers.6.self_attn.v_proj.weight -> decoder.blocks.6.attn.value.weight\n",
998
+ "decoder.blocks.6.attn.value.weight 2 (768, 768)\n",
999
+ "model.decoder.layers.6.self_attn.v_proj.bias -> decoder.blocks.6.attn.value.bias\n",
1000
+ "decoder.blocks.6.attn.value.bias 1 (768,)\n",
1001
+ " Converting to float32\n",
1002
+ "model.decoder.layers.6.self_attn.q_proj.weight -> decoder.blocks.6.attn.query.weight\n",
1003
+ "decoder.blocks.6.attn.query.weight 2 (768, 768)\n",
1004
+ "model.decoder.layers.6.self_attn.q_proj.bias -> decoder.blocks.6.attn.query.bias\n",
1005
+ "decoder.blocks.6.attn.query.bias 1 (768,)\n",
1006
+ " Converting to float32\n",
1007
+ "model.decoder.layers.6.self_attn.out_proj.weight -> decoder.blocks.6.attn.out.weight\n",
1008
+ "decoder.blocks.6.attn.out.weight 2 (768, 768)\n",
1009
+ "model.decoder.layers.6.self_attn.out_proj.bias -> decoder.blocks.6.attn.out.bias\n",
1010
+ "decoder.blocks.6.attn.out.bias 1 (768,)\n",
1011
+ " Converting to float32\n",
1012
+ "model.decoder.layers.6.self_attn_layer_norm.weight -> decoder.blocks.6.attn_ln.weight\n",
1013
+ "decoder.blocks.6.attn_ln.weight 1 (768,)\n",
1014
+ " Converting to float32\n",
1015
+ "model.decoder.layers.6.self_attn_layer_norm.bias -> decoder.blocks.6.attn_ln.bias\n",
1016
+ "decoder.blocks.6.attn_ln.bias 1 (768,)\n",
1017
+ " Converting to float32\n",
1018
+ "model.decoder.layers.6.encoder_attn.k_proj.weight -> decoder.blocks.6.cross_attn.key.weight\n",
1019
+ "decoder.blocks.6.cross_attn.key.weight 2 (768, 768)\n",
1020
+ "model.decoder.layers.6.encoder_attn.v_proj.weight -> decoder.blocks.6.cross_attn.value.weight\n",
1021
+ "decoder.blocks.6.cross_attn.value.weight 2 (768, 768)\n",
1022
+ "model.decoder.layers.6.encoder_attn.v_proj.bias -> decoder.blocks.6.cross_attn.value.bias\n",
1023
+ "decoder.blocks.6.cross_attn.value.bias 1 (768,)\n",
1024
+ " Converting to float32\n",
1025
+ "model.decoder.layers.6.encoder_attn.q_proj.weight -> decoder.blocks.6.cross_attn.query.weight\n",
1026
+ "decoder.blocks.6.cross_attn.query.weight 2 (768, 768)\n",
1027
+ "model.decoder.layers.6.encoder_attn.q_proj.bias -> decoder.blocks.6.cross_attn.query.bias\n",
1028
+ "decoder.blocks.6.cross_attn.query.bias 1 (768,)\n",
1029
+ " Converting to float32\n",
1030
+ "model.decoder.layers.6.encoder_attn.out_proj.weight -> decoder.blocks.6.cross_attn.out.weight\n",
1031
+ "decoder.blocks.6.cross_attn.out.weight 2 (768, 768)\n",
1032
+ "model.decoder.layers.6.encoder_attn.out_proj.bias -> decoder.blocks.6.cross_attn.out.bias\n",
1033
+ "decoder.blocks.6.cross_attn.out.bias 1 (768,)\n",
1034
+ " Converting to float32\n",
1035
+ "model.decoder.layers.6.encoder_attn_layer_norm.weight -> decoder.blocks.6.cross_attn_ln.weight\n",
1036
+ "decoder.blocks.6.cross_attn_ln.weight 1 (768,)\n",
1037
+ " Converting to float32\n",
1038
+ "model.decoder.layers.6.encoder_attn_layer_norm.bias -> decoder.blocks.6.cross_attn_ln.bias\n",
1039
+ "decoder.blocks.6.cross_attn_ln.bias 1 (768,)\n",
1040
+ " Converting to float32\n",
1041
+ "model.decoder.layers.6.fc1.weight -> decoder.blocks.6.mlp.0.weight\n",
1042
+ "decoder.blocks.6.mlp.0.weight 2 (3072, 768)\n",
1043
+ "model.decoder.layers.6.fc1.bias -> decoder.blocks.6.mlp.0.bias\n",
1044
+ "decoder.blocks.6.mlp.0.bias 1 (3072,)\n",
1045
+ " Converting to float32\n",
1046
+ "model.decoder.layers.6.fc2.weight -> decoder.blocks.6.mlp.2.weight\n",
1047
+ "decoder.blocks.6.mlp.2.weight 2 (768, 3072)\n",
1048
+ "model.decoder.layers.6.fc2.bias -> decoder.blocks.6.mlp.2.bias\n",
1049
+ "decoder.blocks.6.mlp.2.bias 1 (768,)\n",
1050
+ " Converting to float32\n",
1051
+ "model.decoder.layers.6.final_layer_norm.weight -> decoder.blocks.6.mlp_ln.weight\n",
1052
+ "decoder.blocks.6.mlp_ln.weight 1 (768,)\n",
1053
+ " Converting to float32\n",
1054
+ "model.decoder.layers.6.final_layer_norm.bias -> decoder.blocks.6.mlp_ln.bias\n",
1055
+ "decoder.blocks.6.mlp_ln.bias 1 (768,)\n",
1056
+ " Converting to float32\n",
1057
+ "model.decoder.layers.7.self_attn.k_proj.weight -> decoder.blocks.7.attn.key.weight\n",
1058
+ "decoder.blocks.7.attn.key.weight 2 (768, 768)\n",
1059
+ "model.decoder.layers.7.self_attn.v_proj.weight -> decoder.blocks.7.attn.value.weight\n",
1060
+ "decoder.blocks.7.attn.value.weight 2 (768, 768)\n",
1061
+ "model.decoder.layers.7.self_attn.v_proj.bias -> decoder.blocks.7.attn.value.bias\n",
1062
+ "decoder.blocks.7.attn.value.bias 1 (768,)\n",
1063
+ " Converting to float32\n",
1064
+ "model.decoder.layers.7.self_attn.q_proj.weight -> decoder.blocks.7.attn.query.weight\n",
1065
+ "decoder.blocks.7.attn.query.weight 2 (768, 768)\n",
1066
+ "model.decoder.layers.7.self_attn.q_proj.bias -> decoder.blocks.7.attn.query.bias\n",
1067
+ "decoder.blocks.7.attn.query.bias 1 (768,)\n",
1068
+ " Converting to float32\n",
1069
+ "model.decoder.layers.7.self_attn.out_proj.weight -> decoder.blocks.7.attn.out.weight\n",
1070
+ "decoder.blocks.7.attn.out.weight 2 (768, 768)\n",
1071
+ "model.decoder.layers.7.self_attn.out_proj.bias -> decoder.blocks.7.attn.out.bias\n",
1072
+ "decoder.blocks.7.attn.out.bias 1 (768,)\n",
1073
+ " Converting to float32\n",
1074
+ "model.decoder.layers.7.self_attn_layer_norm.weight -> decoder.blocks.7.attn_ln.weight\n",
1075
+ "decoder.blocks.7.attn_ln.weight 1 (768,)\n",
1076
+ " Converting to float32\n",
1077
+ "model.decoder.layers.7.self_attn_layer_norm.bias -> decoder.blocks.7.attn_ln.bias\n",
1078
+ "decoder.blocks.7.attn_ln.bias 1 (768,)\n",
1079
+ " Converting to float32\n",
1080
+ "model.decoder.layers.7.encoder_attn.k_proj.weight -> decoder.blocks.7.cross_attn.key.weight\n",
1081
+ "decoder.blocks.7.cross_attn.key.weight 2 (768, 768)\n",
1082
+ "model.decoder.layers.7.encoder_attn.v_proj.weight -> decoder.blocks.7.cross_attn.value.weight\n",
1083
+ "decoder.blocks.7.cross_attn.value.weight 2 (768, 768)\n",
1084
+ "model.decoder.layers.7.encoder_attn.v_proj.bias -> decoder.blocks.7.cross_attn.value.bias\n",
1085
+ "decoder.blocks.7.cross_attn.value.bias 1 (768,)\n",
1086
+ " Converting to float32\n",
1087
+ "model.decoder.layers.7.encoder_attn.q_proj.weight -> decoder.blocks.7.cross_attn.query.weight\n",
1088
+ "decoder.blocks.7.cross_attn.query.weight 2 (768, 768)\n",
1089
+ "model.decoder.layers.7.encoder_attn.q_proj.bias -> decoder.blocks.7.cross_attn.query.bias\n",
1090
+ "decoder.blocks.7.cross_attn.query.bias 1 (768,)\n",
1091
+ " Converting to float32\n",
1092
+ "model.decoder.layers.7.encoder_attn.out_proj.weight -> decoder.blocks.7.cross_attn.out.weight\n",
1093
+ "decoder.blocks.7.cross_attn.out.weight 2 (768, 768)\n",
1094
+ "model.decoder.layers.7.encoder_attn.out_proj.bias -> decoder.blocks.7.cross_attn.out.bias\n",
1095
+ "decoder.blocks.7.cross_attn.out.bias 1 (768,)\n",
1096
+ " Converting to float32\n",
1097
+ "model.decoder.layers.7.encoder_attn_layer_norm.weight -> decoder.blocks.7.cross_attn_ln.weight\n",
1098
+ "decoder.blocks.7.cross_attn_ln.weight 1 (768,)\n",
1099
+ " Converting to float32\n",
1100
+ "model.decoder.layers.7.encoder_attn_layer_norm.bias -> decoder.blocks.7.cross_attn_ln.bias\n",
1101
+ "decoder.blocks.7.cross_attn_ln.bias 1 (768,)\n",
1102
+ " Converting to float32\n",
1103
+ "model.decoder.layers.7.fc1.weight -> decoder.blocks.7.mlp.0.weight\n",
1104
+ "decoder.blocks.7.mlp.0.weight 2 (3072, 768)\n",
1105
+ "model.decoder.layers.7.fc1.bias -> decoder.blocks.7.mlp.0.bias\n",
1106
+ "decoder.blocks.7.mlp.0.bias 1 (3072,)\n",
1107
+ " Converting to float32\n",
1108
+ "model.decoder.layers.7.fc2.weight -> decoder.blocks.7.mlp.2.weight\n",
1109
+ "decoder.blocks.7.mlp.2.weight 2 (768, 3072)\n",
1110
+ "model.decoder.layers.7.fc2.bias -> decoder.blocks.7.mlp.2.bias\n",
1111
+ "decoder.blocks.7.mlp.2.bias 1 (768,)\n",
1112
+ " Converting to float32\n",
1113
+ "model.decoder.layers.7.final_layer_norm.weight -> decoder.blocks.7.mlp_ln.weight\n",
1114
+ "decoder.blocks.7.mlp_ln.weight 1 (768,)\n",
1115
+ " Converting to float32\n",
1116
+ "model.decoder.layers.7.final_layer_norm.bias -> decoder.blocks.7.mlp_ln.bias\n",
1117
+ "decoder.blocks.7.mlp_ln.bias 1 (768,)\n",
1118
+ " Converting to float32\n",
1119
+ "model.decoder.layers.8.self_attn.k_proj.weight -> decoder.blocks.8.attn.key.weight\n",
1120
+ "decoder.blocks.8.attn.key.weight 2 (768, 768)\n",
1121
+ "model.decoder.layers.8.self_attn.v_proj.weight -> decoder.blocks.8.attn.value.weight\n",
1122
+ "decoder.blocks.8.attn.value.weight 2 (768, 768)\n",
1123
+ "model.decoder.layers.8.self_attn.v_proj.bias -> decoder.blocks.8.attn.value.bias\n",
1124
+ "decoder.blocks.8.attn.value.bias 1 (768,)\n",
1125
+ " Converting to float32\n",
1126
+ "model.decoder.layers.8.self_attn.q_proj.weight -> decoder.blocks.8.attn.query.weight\n",
1127
+ "decoder.blocks.8.attn.query.weight 2 (768, 768)\n",
1128
+ "model.decoder.layers.8.self_attn.q_proj.bias -> decoder.blocks.8.attn.query.bias\n",
1129
+ "decoder.blocks.8.attn.query.bias 1 (768,)\n",
1130
+ " Converting to float32\n",
1131
+ "model.decoder.layers.8.self_attn.out_proj.weight -> decoder.blocks.8.attn.out.weight\n",
1132
+ "decoder.blocks.8.attn.out.weight 2 (768, 768)\n",
1133
+ "model.decoder.layers.8.self_attn.out_proj.bias -> decoder.blocks.8.attn.out.bias\n",
1134
+ "decoder.blocks.8.attn.out.bias 1 (768,)\n",
1135
+ " Converting to float32\n",
1136
+ "model.decoder.layers.8.self_attn_layer_norm.weight -> decoder.blocks.8.attn_ln.weight\n",
1137
+ "decoder.blocks.8.attn_ln.weight 1 (768,)\n",
1138
+ " Converting to float32\n",
1139
+ "model.decoder.layers.8.self_attn_layer_norm.bias -> decoder.blocks.8.attn_ln.bias\n",
1140
+ "decoder.blocks.8.attn_ln.bias 1 (768,)\n",
1141
+ " Converting to float32\n",
1142
+ "model.decoder.layers.8.encoder_attn.k_proj.weight -> decoder.blocks.8.cross_attn.key.weight\n",
1143
+ "decoder.blocks.8.cross_attn.key.weight 2 (768, 768)\n",
1144
+ "model.decoder.layers.8.encoder_attn.v_proj.weight -> decoder.blocks.8.cross_attn.value.weight\n",
1145
+ "decoder.blocks.8.cross_attn.value.weight 2 (768, 768)\n",
1146
+ "model.decoder.layers.8.encoder_attn.v_proj.bias -> decoder.blocks.8.cross_attn.value.bias\n",
1147
+ "decoder.blocks.8.cross_attn.value.bias 1 (768,)\n",
1148
+ " Converting to float32\n",
1149
+ "model.decoder.layers.8.encoder_attn.q_proj.weight -> decoder.blocks.8.cross_attn.query.weight\n",
1150
+ "decoder.blocks.8.cross_attn.query.weight 2 (768, 768)\n",
1151
+ "model.decoder.layers.8.encoder_attn.q_proj.bias -> decoder.blocks.8.cross_attn.query.bias\n",
1152
+ "decoder.blocks.8.cross_attn.query.bias 1 (768,)\n",
1153
+ " Converting to float32\n",
1154
+ "model.decoder.layers.8.encoder_attn.out_proj.weight -> decoder.blocks.8.cross_attn.out.weight\n",
1155
+ "decoder.blocks.8.cross_attn.out.weight 2 (768, 768)\n",
1156
+ "model.decoder.layers.8.encoder_attn.out_proj.bias -> decoder.blocks.8.cross_attn.out.bias\n",
1157
+ "decoder.blocks.8.cross_attn.out.bias 1 (768,)\n",
1158
+ " Converting to float32\n",
1159
+ "model.decoder.layers.8.encoder_attn_layer_norm.weight -> decoder.blocks.8.cross_attn_ln.weight\n",
1160
+ "decoder.blocks.8.cross_attn_ln.weight 1 (768,)\n",
1161
+ " Converting to float32\n",
1162
+ "model.decoder.layers.8.encoder_attn_layer_norm.bias -> decoder.blocks.8.cross_attn_ln.bias\n",
1163
+ "decoder.blocks.8.cross_attn_ln.bias 1 (768,)\n",
1164
+ " Converting to float32\n",
1165
+ "model.decoder.layers.8.fc1.weight -> decoder.blocks.8.mlp.0.weight\n",
1166
+ "decoder.blocks.8.mlp.0.weight 2 (3072, 768)\n",
1167
+ "model.decoder.layers.8.fc1.bias -> decoder.blocks.8.mlp.0.bias\n",
1168
+ "decoder.blocks.8.mlp.0.bias 1 (3072,)\n",
1169
+ " Converting to float32\n",
1170
+ "model.decoder.layers.8.fc2.weight -> decoder.blocks.8.mlp.2.weight\n",
1171
+ "decoder.blocks.8.mlp.2.weight 2 (768, 3072)\n",
1172
+ "model.decoder.layers.8.fc2.bias -> decoder.blocks.8.mlp.2.bias\n",
1173
+ "decoder.blocks.8.mlp.2.bias 1 (768,)\n",
1174
+ " Converting to float32\n",
1175
+ "model.decoder.layers.8.final_layer_norm.weight -> decoder.blocks.8.mlp_ln.weight\n",
1176
+ "decoder.blocks.8.mlp_ln.weight 1 (768,)\n",
1177
+ " Converting to float32\n",
1178
+ "model.decoder.layers.8.final_layer_norm.bias -> decoder.blocks.8.mlp_ln.bias\n",
1179
+ "decoder.blocks.8.mlp_ln.bias 1 (768,)\n",
1180
+ " Converting to float32\n",
1181
+ "model.decoder.layers.9.self_attn.k_proj.weight -> decoder.blocks.9.attn.key.weight\n",
1182
+ "decoder.blocks.9.attn.key.weight 2 (768, 768)\n",
1183
+ "model.decoder.layers.9.self_attn.v_proj.weight -> decoder.blocks.9.attn.value.weight\n",
1184
+ "decoder.blocks.9.attn.value.weight 2 (768, 768)\n",
1185
+ "model.decoder.layers.9.self_attn.v_proj.bias -> decoder.blocks.9.attn.value.bias\n",
1186
+ "decoder.blocks.9.attn.value.bias 1 (768,)\n",
1187
+ " Converting to float32\n",
1188
+ "model.decoder.layers.9.self_attn.q_proj.weight -> decoder.blocks.9.attn.query.weight\n",
1189
+ "decoder.blocks.9.attn.query.weight 2 (768, 768)\n",
1190
+ "model.decoder.layers.9.self_attn.q_proj.bias -> decoder.blocks.9.attn.query.bias\n",
1191
+ "decoder.blocks.9.attn.query.bias 1 (768,)\n",
1192
+ " Converting to float32\n",
1193
+ "model.decoder.layers.9.self_attn.out_proj.weight -> decoder.blocks.9.attn.out.weight\n",
1194
+ "decoder.blocks.9.attn.out.weight 2 (768, 768)\n",
1195
+ "model.decoder.layers.9.self_attn.out_proj.bias -> decoder.blocks.9.attn.out.bias\n",
1196
+ "decoder.blocks.9.attn.out.bias 1 (768,)\n",
1197
+ " Converting to float32\n",
1198
+ "model.decoder.layers.9.self_attn_layer_norm.weight -> decoder.blocks.9.attn_ln.weight\n",
1199
+ "decoder.blocks.9.attn_ln.weight 1 (768,)\n",
1200
+ " Converting to float32\n",
1201
+ "model.decoder.layers.9.self_attn_layer_norm.bias -> decoder.blocks.9.attn_ln.bias\n",
1202
+ "decoder.blocks.9.attn_ln.bias 1 (768,)\n",
1203
+ " Converting to float32\n",
1204
+ "model.decoder.layers.9.encoder_attn.k_proj.weight -> decoder.blocks.9.cross_attn.key.weight\n",
1205
+ "decoder.blocks.9.cross_attn.key.weight 2 (768, 768)\n",
1206
+ "model.decoder.layers.9.encoder_attn.v_proj.weight -> decoder.blocks.9.cross_attn.value.weight\n",
1207
+ "decoder.blocks.9.cross_attn.value.weight 2 (768, 768)\n",
1208
+ "model.decoder.layers.9.encoder_attn.v_proj.bias -> decoder.blocks.9.cross_attn.value.bias\n",
1209
+ "decoder.blocks.9.cross_attn.value.bias 1 (768,)\n",
1210
+ " Converting to float32\n",
1211
+ "model.decoder.layers.9.encoder_attn.q_proj.weight -> decoder.blocks.9.cross_attn.query.weight\n",
1212
+ "decoder.blocks.9.cross_attn.query.weight 2 (768, 768)\n",
1213
+ "model.decoder.layers.9.encoder_attn.q_proj.bias -> decoder.blocks.9.cross_attn.query.bias\n",
1214
+ "decoder.blocks.9.cross_attn.query.bias 1 (768,)\n",
1215
+ " Converting to float32\n",
1216
+ "model.decoder.layers.9.encoder_attn.out_proj.weight -> decoder.blocks.9.cross_attn.out.weight\n",
1217
+ "decoder.blocks.9.cross_attn.out.weight 2 (768, 768)\n",
1218
+ "model.decoder.layers.9.encoder_attn.out_proj.bias -> decoder.blocks.9.cross_attn.out.bias\n",
1219
+ "decoder.blocks.9.cross_attn.out.bias 1 (768,)\n",
1220
+ " Converting to float32\n",
1221
+ "model.decoder.layers.9.encoder_attn_layer_norm.weight -> decoder.blocks.9.cross_attn_ln.weight\n",
1222
+ "decoder.blocks.9.cross_attn_ln.weight 1 (768,)\n",
1223
+ " Converting to float32\n",
1224
+ "model.decoder.layers.9.encoder_attn_layer_norm.bias -> decoder.blocks.9.cross_attn_ln.bias\n",
1225
+ "decoder.blocks.9.cross_attn_ln.bias 1 (768,)\n",
1226
+ " Converting to float32\n",
1227
+ "model.decoder.layers.9.fc1.weight -> decoder.blocks.9.mlp.0.weight\n",
1228
+ "decoder.blocks.9.mlp.0.weight 2 (3072, 768)\n",
1229
+ "model.decoder.layers.9.fc1.bias -> decoder.blocks.9.mlp.0.bias\n",
1230
+ "decoder.blocks.9.mlp.0.bias 1 (3072,)\n",
1231
+ " Converting to float32\n",
1232
+ "model.decoder.layers.9.fc2.weight -> decoder.blocks.9.mlp.2.weight\n",
1233
+ "decoder.blocks.9.mlp.2.weight 2 (768, 3072)\n",
1234
+ "model.decoder.layers.9.fc2.bias -> decoder.blocks.9.mlp.2.bias\n",
1235
+ "decoder.blocks.9.mlp.2.bias 1 (768,)\n",
1236
+ " Converting to float32\n",
1237
+ "model.decoder.layers.9.final_layer_norm.weight -> decoder.blocks.9.mlp_ln.weight\n",
1238
+ "decoder.blocks.9.mlp_ln.weight 1 (768,)\n",
1239
+ " Converting to float32\n",
1240
+ "model.decoder.layers.9.final_layer_norm.bias -> decoder.blocks.9.mlp_ln.bias\n",
1241
+ "decoder.blocks.9.mlp_ln.bias 1 (768,)\n",
1242
+ " Converting to float32\n",
1243
+ "model.decoder.layers.10.self_attn.k_proj.weight -> decoder.blocks.10.attn.key.weight\n",
1244
+ "decoder.blocks.10.attn.key.weight 2 (768, 768)\n",
1245
+ "model.decoder.layers.10.self_attn.v_proj.weight -> decoder.blocks.10.attn.value.weight\n",
1246
+ "decoder.blocks.10.attn.value.weight 2 (768, 768)\n",
1247
+ "model.decoder.layers.10.self_attn.v_proj.bias -> decoder.blocks.10.attn.value.bias\n",
1248
+ "decoder.blocks.10.attn.value.bias 1 (768,)\n",
1249
+ " Converting to float32\n",
1250
+ "model.decoder.layers.10.self_attn.q_proj.weight -> decoder.blocks.10.attn.query.weight\n",
1251
+ "decoder.blocks.10.attn.query.weight 2 (768, 768)\n",
1252
+ "model.decoder.layers.10.self_attn.q_proj.bias -> decoder.blocks.10.attn.query.bias\n",
1253
+ "decoder.blocks.10.attn.query.bias 1 (768,)\n",
1254
+ " Converting to float32\n",
1255
+ "model.decoder.layers.10.self_attn.out_proj.weight -> decoder.blocks.10.attn.out.weight\n",
1256
+ "decoder.blocks.10.attn.out.weight 2 (768, 768)\n",
1257
+ "model.decoder.layers.10.self_attn.out_proj.bias -> decoder.blocks.10.attn.out.bias\n",
1258
+ "decoder.blocks.10.attn.out.bias 1 (768,)\n",
1259
+ " Converting to float32\n",
1260
+ "model.decoder.layers.10.self_attn_layer_norm.weight -> decoder.blocks.10.attn_ln.weight\n",
1261
+ "decoder.blocks.10.attn_ln.weight 1 (768,)\n",
1262
+ " Converting to float32\n",
1263
+ "model.decoder.layers.10.self_attn_layer_norm.bias -> decoder.blocks.10.attn_ln.bias\n",
1264
+ "decoder.blocks.10.attn_ln.bias 1 (768,)\n",
1265
+ " Converting to float32\n",
1266
+ "model.decoder.layers.10.encoder_attn.k_proj.weight -> decoder.blocks.10.cross_attn.key.weight\n",
1267
+ "decoder.blocks.10.cross_attn.key.weight 2 (768, 768)\n",
1268
+ "model.decoder.layers.10.encoder_attn.v_proj.weight -> decoder.blocks.10.cross_attn.value.weight\n",
1269
+ "decoder.blocks.10.cross_attn.value.weight 2 (768, 768)\n",
1270
+ "model.decoder.layers.10.encoder_attn.v_proj.bias -> decoder.blocks.10.cross_attn.value.bias\n",
1271
+ "decoder.blocks.10.cross_attn.value.bias 1 (768,)\n",
1272
+ " Converting to float32\n",
1273
+ "model.decoder.layers.10.encoder_attn.q_proj.weight -> decoder.blocks.10.cross_attn.query.weight\n",
1274
+ "decoder.blocks.10.cross_attn.query.weight 2 (768, 768)\n",
1275
+ "model.decoder.layers.10.encoder_attn.q_proj.bias -> decoder.blocks.10.cross_attn.query.bias\n",
1276
+ "decoder.blocks.10.cross_attn.query.bias 1 (768,)\n",
1277
+ " Converting to float32\n",
1278
+ "model.decoder.layers.10.encoder_attn.out_proj.weight -> decoder.blocks.10.cross_attn.out.weight\n",
1279
+ "decoder.blocks.10.cross_attn.out.weight 2 (768, 768)\n",
1280
+ "model.decoder.layers.10.encoder_attn.out_proj.bias -> decoder.blocks.10.cross_attn.out.bias\n",
1281
+ "decoder.blocks.10.cross_attn.out.bias 1 (768,)\n",
1282
+ " Converting to float32\n",
1283
+ "model.decoder.layers.10.encoder_attn_layer_norm.weight -> decoder.blocks.10.cross_attn_ln.weight\n",
1284
+ "decoder.blocks.10.cross_attn_ln.weight 1 (768,)\n",
1285
+ " Converting to float32\n",
1286
+ "model.decoder.layers.10.encoder_attn_layer_norm.bias -> decoder.blocks.10.cross_attn_ln.bias\n",
1287
+ "decoder.blocks.10.cross_attn_ln.bias 1 (768,)\n",
1288
+ " Converting to float32\n",
1289
+ "model.decoder.layers.10.fc1.weight -> decoder.blocks.10.mlp.0.weight\n",
1290
+ "decoder.blocks.10.mlp.0.weight 2 (3072, 768)\n",
1291
+ "model.decoder.layers.10.fc1.bias -> decoder.blocks.10.mlp.0.bias\n",
1292
+ "decoder.blocks.10.mlp.0.bias 1 (3072,)\n",
1293
+ " Converting to float32\n",
1294
+ "model.decoder.layers.10.fc2.weight -> decoder.blocks.10.mlp.2.weight\n",
1295
+ "decoder.blocks.10.mlp.2.weight 2 (768, 3072)\n",
1296
+ "model.decoder.layers.10.fc2.bias -> decoder.blocks.10.mlp.2.bias\n",
1297
+ "decoder.blocks.10.mlp.2.bias 1 (768,)\n",
1298
+ " Converting to float32\n",
1299
+ "model.decoder.layers.10.final_layer_norm.weight -> decoder.blocks.10.mlp_ln.weight\n",
1300
+ "decoder.blocks.10.mlp_ln.weight 1 (768,)\n",
1301
+ " Converting to float32\n",
1302
+ "model.decoder.layers.10.final_layer_norm.bias -> decoder.blocks.10.mlp_ln.bias\n",
1303
+ "decoder.blocks.10.mlp_ln.bias 1 (768,)\n",
1304
+ " Converting to float32\n",
1305
+ "model.decoder.layers.11.self_attn.k_proj.weight -> decoder.blocks.11.attn.key.weight\n",
1306
+ "decoder.blocks.11.attn.key.weight 2 (768, 768)\n",
1307
+ "model.decoder.layers.11.self_attn.v_proj.weight -> decoder.blocks.11.attn.value.weight\n",
1308
+ "decoder.blocks.11.attn.value.weight 2 (768, 768)\n",
1309
+ "model.decoder.layers.11.self_attn.v_proj.bias -> decoder.blocks.11.attn.value.bias\n",
1310
+ "decoder.blocks.11.attn.value.bias 1 (768,)\n",
1311
+ " Converting to float32\n",
1312
+ "model.decoder.layers.11.self_attn.q_proj.weight -> decoder.blocks.11.attn.query.weight\n",
1313
+ "decoder.blocks.11.attn.query.weight 2 (768, 768)\n",
1314
+ "model.decoder.layers.11.self_attn.q_proj.bias -> decoder.blocks.11.attn.query.bias\n",
1315
+ "decoder.blocks.11.attn.query.bias 1 (768,)\n",
1316
+ " Converting to float32\n",
1317
+ "model.decoder.layers.11.self_attn.out_proj.weight -> decoder.blocks.11.attn.out.weight\n",
1318
+ "decoder.blocks.11.attn.out.weight 2 (768, 768)\n",
1319
+ "model.decoder.layers.11.self_attn.out_proj.bias -> decoder.blocks.11.attn.out.bias\n",
1320
+ "decoder.blocks.11.attn.out.bias 1 (768,)\n",
1321
+ " Converting to float32\n",
1322
+ "model.decoder.layers.11.self_attn_layer_norm.weight -> decoder.blocks.11.attn_ln.weight\n",
1323
+ "decoder.blocks.11.attn_ln.weight 1 (768,)\n",
1324
+ " Converting to float32\n",
1325
+ "model.decoder.layers.11.self_attn_layer_norm.bias -> decoder.blocks.11.attn_ln.bias\n",
1326
+ "decoder.blocks.11.attn_ln.bias 1 (768,)\n",
1327
+ " Converting to float32\n",
1328
+ "model.decoder.layers.11.encoder_attn.k_proj.weight -> decoder.blocks.11.cross_attn.key.weight\n",
1329
+ "decoder.blocks.11.cross_attn.key.weight 2 (768, 768)\n",
1330
+ "model.decoder.layers.11.encoder_attn.v_proj.weight -> decoder.blocks.11.cross_attn.value.weight\n",
1331
+ "decoder.blocks.11.cross_attn.value.weight 2 (768, 768)\n",
1332
+ "model.decoder.layers.11.encoder_attn.v_proj.bias -> decoder.blocks.11.cross_attn.value.bias\n",
1333
+ "decoder.blocks.11.cross_attn.value.bias 1 (768,)\n",
1334
+ " Converting to float32\n",
1335
+ "model.decoder.layers.11.encoder_attn.q_proj.weight -> decoder.blocks.11.cross_attn.query.weight\n",
1336
+ "decoder.blocks.11.cross_attn.query.weight 2 (768, 768)\n",
1337
+ "model.decoder.layers.11.encoder_attn.q_proj.bias -> decoder.blocks.11.cross_attn.query.bias\n",
1338
+ "decoder.blocks.11.cross_attn.query.bias 1 (768,)\n",
1339
+ " Converting to float32\n",
1340
+ "model.decoder.layers.11.encoder_attn.out_proj.weight -> decoder.blocks.11.cross_attn.out.weight\n",
1341
+ "decoder.blocks.11.cross_attn.out.weight 2 (768, 768)\n",
1342
+ "model.decoder.layers.11.encoder_attn.out_proj.bias -> decoder.blocks.11.cross_attn.out.bias\n",
1343
+ "decoder.blocks.11.cross_attn.out.bias 1 (768,)\n",
1344
+ " Converting to float32\n",
1345
+ "model.decoder.layers.11.encoder_attn_layer_norm.weight -> decoder.blocks.11.cross_attn_ln.weight\n",
1346
+ "decoder.blocks.11.cross_attn_ln.weight 1 (768,)\n",
1347
+ " Converting to float32\n",
1348
+ "model.decoder.layers.11.encoder_attn_layer_norm.bias -> decoder.blocks.11.cross_attn_ln.bias\n",
1349
+ "decoder.blocks.11.cross_attn_ln.bias 1 (768,)\n",
1350
+ " Converting to float32\n",
1351
+ "model.decoder.layers.11.fc1.weight -> decoder.blocks.11.mlp.0.weight\n",
1352
+ "decoder.blocks.11.mlp.0.weight 2 (3072, 768)\n",
1353
+ "model.decoder.layers.11.fc1.bias -> decoder.blocks.11.mlp.0.bias\n",
1354
+ "decoder.blocks.11.mlp.0.bias 1 (3072,)\n",
1355
+ " Converting to float32\n",
1356
+ "model.decoder.layers.11.fc2.weight -> decoder.blocks.11.mlp.2.weight\n",
1357
+ "decoder.blocks.11.mlp.2.weight 2 (768, 3072)\n",
1358
+ "model.decoder.layers.11.fc2.bias -> decoder.blocks.11.mlp.2.bias\n",
1359
+ "decoder.blocks.11.mlp.2.bias 1 (768,)\n",
1360
+ " Converting to float32\n",
1361
+ "model.decoder.layers.11.final_layer_norm.weight -> decoder.blocks.11.mlp_ln.weight\n",
1362
+ "decoder.blocks.11.mlp_ln.weight 1 (768,)\n",
1363
+ " Converting to float32\n",
1364
+ "model.decoder.layers.11.final_layer_norm.bias -> decoder.blocks.11.mlp_ln.bias\n",
1365
+ "decoder.blocks.11.mlp_ln.bias 1 (768,)\n",
1366
+ " Converting to float32\n",
1367
+ "model.decoder.layer_norm.weight -> decoder.ln.weight\n",
1368
+ "decoder.ln.weight 1 (768,)\n",
1369
+ " Converting to float32\n",
1370
+ "model.decoder.layer_norm.bias -> decoder.ln.bias\n",
1371
+ "decoder.ln.bias 1 (768,)\n",
1372
+ " Converting to float32\n",
1373
+ "Skipping proj_out.weight\n",
1374
+ "Done. Output file: ./ggml-model.bin\n",
1375
+ "\n"
1376
+ ]
1377
+ }
1378
+ ]
1379
+ }
1380
+ ]
1381
+ }