Update README - Add model details

#14
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +67 -2
  3. model_logo.png +3 -0
.gitattributes CHANGED
@@ -482,3 +482,4 @@ ckpt/tensor00761_000 filter=lfs diff=lfs merge=lfs -text
482
  ckpt/tensor00762_000 filter=lfs diff=lfs merge=lfs -text
483
  ckpt/tensor00763_000 filter=lfs diff=lfs merge=lfs -text
484
  ckpt/tensor00764_000 filter=lfs diff=lfs merge=lfs -text
 
 
482
  ckpt/tensor00762_000 filter=lfs diff=lfs merge=lfs -text
483
  ckpt/tensor00763_000 filter=lfs diff=lfs merge=lfs -text
484
  ckpt/tensor00764_000 filter=lfs diff=lfs merge=lfs -text
485
+ model_logo.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -2,8 +2,73 @@
2
  license: apache-2.0
3
  ---
4
  # Grok-1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
- This repository contains the weights of the Grok-1 open-weights model.
 
7
 
8
  Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
9
 
@@ -18,4 +83,4 @@ You should be seeing output from the language model.
18
 
19
  Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
20
 
21
- p.s. we're hiring: https://x.ai/career
 
2
  license: apache-2.0
3
  ---
4
  # Grok-1
5
+ ---
6
+ _This repository contains the weights of the Grok-1 open-weights model._
7
+
8
+ **To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`
9
+
10
+
11
+ ![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)
12
+
13
+ <small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>
14
+
15
+ ---
16
+
17
+ ╔══════════════════════════╗
18
+ β•‘ _______ β•‘
19
+ β•‘ /\ |_ _| β•‘
20
+ β•‘ __ __ / \ | | β•‘
21
+ β•‘ \ \/ / / /\ \ | | β•‘
22
+ β•‘ > < / ____ \ _| |_ β•‘
23
+ β•‘ /_/\_\/_/ \_\_____| β•‘
24
+ β•‘ β•‘
25
+ β•‘ Understand the Universe β•‘
26
+ β•‘ [https://x.ai] β•‘
27
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•β•β•β•
28
+ β•”β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•—
29
+ β•‘ xAI Grok-1 (314B) β•‘
30
+ β•šβ•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•
31
+ β•”β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•—
32
+ β•‘ 314B parameter Mixture of Experts model β•‘
33
+ β•‘ - Base model (not finetuned) β•‘
34
+ β•‘ - 8 experts (2 active) β•‘
35
+ β•‘ - 86B active parameters β•‘
36
+ β•‘ - Apache 2.0 license β•‘
37
+ β•‘ - Code: https://github.com/xai-org/grok-1 β•‘
38
+ β•‘ - Happy coding! β•‘
39
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
40
+
41
+ ## Model Configuration Details
42
+
43
+ **Vocabulary Size**: 131,072
44
+
45
+ **Special Tokens**:
46
+ - Pad Token: 0
47
+ - End of Sequence Token: 2
48
+
49
+ **Sequence Length**: 8192
50
+
51
+ ### **Model Architecture**: MoE
52
+ - **Embedding Size**: 6,144
53
+ - Rotary Embedding (RoPE)
54
+ - **Layers**: 64
55
+ - **Experts**: 8
56
+ - **Selected Experts**: 2
57
+ - **Widening Factor**: 8
58
+ - **Key Size**: 128
59
+ - **Query Heads**: 48
60
+ - **Key Value Heads**: 8
61
+ - **Activation Sharding**: Data-wise, Model-wise
62
+ - **Tokenizer** : SentencePiece tokenizer
63
+
64
+ ### **Inference Configuration**:
65
+ - Batch Size per Device: 0.125
66
+ - Tokenizer: `./tokenizer.model`
67
+ - Local Mesh: 1x8
68
+ - Between Hosts: 1x1
69
 
70
+
71
+ ## Inference Details
72
 
73
  Make sure to download the `int8` checkpoint to the `checkpoints` directory and run
74
 
 
83
 
84
  Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
85
 
86
+ **p.s. we're hiring: https://x.ai/careers**
model_logo.png ADDED

Git LFS Details

  • SHA256: 5fc985296d2a853cce201117ba2d8be3d3b2f046b64eddd4d0eb5fdcf8aea71c
  • Pointer size: 132 Bytes
  • Size of remote file: 2.34 MB