ekurtic commited on
Commit
b8496c0
1 Parent(s): 106d53e

Model release

Browse files
.gitattributes CHANGED
@@ -25,3 +25,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ *.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # oBERT-12-upstream-pretrained-dense
2
+
3
+ This model is obtained with [The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models](https://arxiv.org/abs/2203.07259).
4
+
5
+
6
+ It corresponds to the pretrained dense model used as a teacher for upstream pruning runs, as described in the paper. The model can be finetuned on any downstream task, just like the standard `bert-base-uncased` model which is used as initialization for training of this model.
7
+
8
+ Sparse versions of this model:
9
+ - 90% sparse: `neuralmagic/oBERT-12-upstream-pruned-unstructured-90`
10
+ - 97% sparse: `neuralmagic/oBERT-12-upstream-pruned-unstructured-97`
11
+
12
+ ```
13
+ Training objective: masked language modeling (MLM)
14
+ Paper: https://arxiv.org/abs/2203.07259
15
+ Dataset: BookCorpus and English Wikipedia
16
+ Sparsity: 0%
17
+ Number of layers: 12
18
+ ```
19
+
20
+ Code: _coming soon_
21
+
22
+ ## BibTeX entry and citation info
23
+ ```bibtex
24
+ @article{kurtic2022optimal,
25
+ title={The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models},
26
+ author={Kurtic, Eldar and Campos, Daniel and Nguyen, Tuan and Frantar, Elias and Kurtz, Mark and Fineran, Benjamin and Goin, Michael and Alistarh, Dan},
27
+ journal={arXiv preprint arXiv:2203.07259},
28
+ year={2022}
29
+ }
30
+ ```
all_results.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:359d6063d9db8c882c054ac3e51a6eaf4479f4b03a4bc3a3910ae2605a819043
3
+ size 796
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3d825720ed2ef61de1853c9e89e6c3460149e90904d818a670eed7a2b8afb85
3
+ size 605
eval_results.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec66eb48adcf4b60135b3a4cbfa547aa1123cfdd16a4345f439db5421151cd71
3
+ size 355
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c80317bc0ae45812bb7fdff787f97c93d3b14b91294eee3bd2848759859d6be0
3
+ size 438147282
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:303df45a03609e4ead04bc3dc1536d0ab19b5358db685b6f3da123d05ec200e3
3
+ size 112
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a863c20bb9664ba983f10e20d34c790e0eea92f165fc4716c4bad62f6bdc70b4
3
+ size 285
train_results.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e06bcd2baef3959c4bd83a80cbad0e037c569208ca05141aa610b0584ee2d39
3
+ size 462
trainer_state.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c4e0f26af343523ec179d7f457753f0790e177aebc018fb4095c4830ba599d2
3
+ size 94108
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64064c985489f233c9b46091a17d2b7036b4cd93643c50fcd6f27e6cef4fe3c6
3
+ size 2351
vocab.txt ADDED
The diff for this file is too large to render. See raw diff