File size: 3,628 Bytes
a1d5101
 
 
15af4e5
1bc9149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15af4e5
1bc9149
 
15af4e5
 
 
 
 
 
 
 
 
 
 
 
 
e35b056
1bc9149
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: apache-2.0
---
# Grok-1
---
_This repository contains the weights of the Grok-1 open-weights model._

**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`


![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)

<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>

---

                         ╔══════════════════════════╗
                         β•‘                 _______  β•‘
                         β•‘            /\   |_   _|  β•‘
                         β•‘  __  __   /  \    | |    β•‘
                         β•‘  \ \/ /  / /\ \   | |    β•‘
                         β•‘   >  <  / ____ \ _| |_   β•‘
                         β•‘  /_/\_\/_/    \_\_____|  β•‘
                         β•‘                          β•‘
                         β•‘  Understand the Universe β•‘
                         β•‘      [https://x.ai]      β•‘
                         β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•β•β•β•
                             β•”β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•—
                             β•‘ xAI Grok-1 (314B) β•‘
                             β•šβ•β•β•β•β•β•β•β•β•—β•”β•β•β•β•β•β•β•β•β•β•
                β•”β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•—
                β•‘ 314B parameter Mixture of Experts model    β•‘
                β•‘ - Base model (not finetuned)               β•‘
                β•‘ - 8 experts (2 active)                     β•‘
                β•‘ - 86B active parameters                    β•‘
                β•‘ - Apache 2.0 license                       β•‘
                β•‘ - Code: https://github.com/xai-org/grok-1  β•‘
                β•‘ - Happy coding!                            β•‘
                β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

## Model Configuration Details

**Vocabulary Size**: 131,072

**Special Tokens**:
- Pad Token: 0
- End of Sequence Token: 2

**Sequence Length**: 8192

### **Model Architecture**: MoE
- **Embedding Size**: 6,144
    - Rotary Embedding (RoPE)
- **Layers**: 64
- **Experts**: 8
- **Selected Experts**: 2
- **Widening Factor**: 8
- **Key Size**: 128
- **Query Heads**: 48
- **Key Value Heads**: 8
- **Activation Sharding**: Data-wise, Model-wise
- **Tokenizer** : SentencePiece tokenizer

### **Inference Configuration**:
- Batch Size per Device: 0.125
- Tokenizer: `./tokenizer.model`
- Local Mesh: 1x8
- Between Hosts: 1x1

  
## Inference Details

Make sure to download the `int8` checkpoint to the `checkpoints` directory and run

```shell
pip install -r requirements.txt
python transformer.py
```

to test the code.

You should be seeing output from the language model.

Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

**p.s. we're hiring: https://x.ai/careers**