File size: 3,628 Bytes

---
license: apache-2.0
---
# Grok-1
---
_This repository contains the weights of the Grok-1 open-weights model._

**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.`


![The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.](./model_logo.png)

<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small>

---

                         ╔══════════════════════════╗
                         ║                 _______  ║
                         ║            /\   |_   _|  ║
                         ║  __  __   /  \    | |    ║
                         ║  \ \/ /  / /\ \   | |    ║
                         ║   >  <  / ____ \ _| |_   ║
                         ║  /_/\_\/_/    \_\_____|  ║
                         ║                          ║
                         ║  Understand the Universe ║
                         ║      [https://x.ai]      ║
                         ╚════════════╗╔════════════╝
                             ╔════════╝╚═════════╗
                             ║ xAI Grok-1 (314B) ║
                             ╚════════╗╔═════════╝
                ╔═════════════════════╝╚═════════════════════╗
                ║ 314B parameter Mixture of Experts model    ║
                ║ - Base model (not finetuned)               ║
                ║ - 8 experts (2 active)                     ║
                ║ - 86B active parameters                    ║
                ║ - Apache 2.0 license                       ║
                ║ - Code: https://github.com/xai-org/grok-1  ║
                ║ - Happy coding!                            ║
                ╚════════════════════════════════════════════╝

## Model Configuration Details

**Vocabulary Size**: 131,072

**Special Tokens**:
- Pad Token: 0
- End of Sequence Token: 2

**Sequence Length**: 8192

### **Model Architecture**: MoE
- **Embedding Size**: 6,144
    - Rotary Embedding (RoPE)
- **Layers**: 64
- **Experts**: 8
- **Selected Experts**: 2
- **Widening Factor**: 8
- **Key Size**: 128
- **Query Heads**: 48
- **Key Value Heads**: 8
- **Activation Sharding**: Data-wise, Model-wise
- **Tokenizer** : SentencePiece tokenizer

### **Inference Configuration**:
- Batch Size per Device: 0.125
- Tokenizer: `./tokenizer.model`
- Local Mesh: 1x8
- Between Hosts: 1x1

  
## Inference Details

Make sure to download the `int8` checkpoint to the `checkpoints` directory and run

```shell
pip install -r requirements.txt
python transformer.py
```

to test the code.

You should be seeing output from the language model.

Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

**p.s. we're hiring: https://x.ai/careers**