File size: 3,849 Bytes
60b100a f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f76ec80 f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 196a2fe f85eda6 f76ec80 f85eda6 60b100a f85eda6 c5172f5 dfb7ef8 f85eda6 13fcb5a f85eda6 6208e73 f85eda6 8b152ee f85eda6 8b152ee f85eda6 196a2fe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
model-index:
- name: ibm/PowerMoE-3b
results:
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: ARC
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 58.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: BoolQ
metrics:
- name: accuracy
type: accuracy
value: 65.0
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Hellaswag
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 71.5
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: OpenBookQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 41.0
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: PIQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 79.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Winogrande
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 65.0
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: MMLU (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 42.8
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: GSM8k (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 25.9
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: math (4 shot)
metrics:
- name: accuracy
type: accuracy
value: 14.8
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: humaneval
metrics:
- name: pass@1
type: pass@1
value: 20.1
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: MBPP
metrics:
- name: pass@1
type: pass@1
value: 32.4
verified: false
---
## Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
Paper: https://arxiv.org/abs/2408.13359
## Usage
Note: Requires installing HF transformers from source.
### Generation
This is a simple example of how to use **PowerMoE-3b** model.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "ibm/PowerMoE-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# transfer tokenized inputs to the device
for i in input_tokens:
input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
``` |