π Skyro-4X8B
Skyro-4X8B is a Mixure of Experts (MoE) made with the following models using Mergekit:
- abacusai/Llama-3-Smaug-8B
- cognitivecomputations/dolphin-2.9-llama3-8b
- Weyaxi/Einstein-v6.1-Llama3-8B
- dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
𧩠Configuration
base_model: meta-llama/Meta-Llama-3-8B
gate_mode: hidden
experts:
- source_model: abacusai/Llama-3-Smaug-8B
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- "I want"
- source_model: cognitivecomputations/dolphin-2.9-llama3-8b
positive_prompts:
- "math"
- "mathematics"
- "code"
- "engineering"
- "solve"
- "logic"
- "rationality"
- "puzzle"
- "solve"
- source_model: Weyaxi/Einstein-v6.1-Llama3-8B
positive_prompts:
- "science"
- "medical"
- "physics"
- "engineering"
- "math"
- "logic"
- "rationality"
- "mathematics"
- "solve"
- source_model: dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
positive_prompts:
- "story"
- "roleplay"
- "role-play"
- "storywriting"
- "character"
- "narrative"
- "creative"
Evaluation
Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
---|---|---|---|---|---|---|
66.39 | 61.26 | 82.38 | 66.67 | 50.15 | 77.66 | 60.2 |
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "saucam/Skyro-4X8B"
messages = [{"role": "user", "content": "In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Sample output
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 878/878 [00:00<00:00, 4.18MB/s]
model.safetensors.index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββ| 53.5k/53.5k [00:00<00:00, 101MB/s]
model-00001-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 9.89G/9.89G [03:47<00:00, 43.4MB/s]
model-00002-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 9.98G/9.98G [03:23<00:00, 49.0MB/s]
model-00003-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 9.98G/9.98G [03:44<00:00, 44.5MB/s]
model-00004-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 9.90G/9.90G [03:30<00:00, 46.9MB/s]
model-00005-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 9.08G/9.08G [03:08<00:00, 48.1MB/s]
model-00006-of-00006.safetensors: 100%|βββββββββββββββββββββββββββββββββββββ| 1.05G/1.05G [00:20<00:00, 51.3MB/s]
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [17:58<00:00, 179.78s/it]
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [01:27<00:00, 14.59s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|im_start|>user
In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?<|im_end|>
<|im_start|>assistant
Let's denote the number of votes candidate A got as \( A \).
Candidate B got 50% more votes than candidate A, so candidate B got \( A + 0.5A = 1.5A \) votes.
Candidate C got the rest of the votes, which means \( C = 100 - (A + 1.5A) \).
We know that candidate A got 20% of the votes, so \( A = 20\% \times 100 = 20 \).
Now we can calculate candidate C's votes:
\( C = 100 - (20 + 1.5 \times 20) \)
\( C = 100 - (20 + 30) \)
\( C = 100 - 50 \)
\( C = 50 \).
Therefore, candidate C got 50 votes.<|im_end|>
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for saucam/Skyro-4X8B
Merge model
this model