A Pythia Chat Model of 31M Parameters

Base model: EleutherAI/pythia-31m
Availability in other ML formats:
- GGUF: Felladrin/gguf-Pythia-31M-Chat-v1
- ONNX: Felladrin/onnx-Pythia-31M-Chat-v1

Recommended prompt format

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

Recommended inference parameters

penalty_alpha: 0.5
top_k: 2
repetition_penalty: 1.0016

Datasets and parameters used for training

Dataset	License Type
totally-not-an-llm/EverythingLM-data-V3	mit
databricks/databricks-dolly-15k	cc-by-sa-3.0
THUDM/webglm-qa	apache-2.0
starfishmedical/webGPT_x_dolly	cc-by-sa-3.0
Amod/mental_health_counseling_conversations	openrail
sablo/oasst2_curated	apache-2.0
cognitivecomputations/wizard_vicuna_70k_unfiltered	apache-2.0
mlabonne/chatml_dpo_pairs	apache-2.0

SFTTrainer(
    model,
    train_dataset=train_dataset,
    dataset_text_field="text",
    eval_dataset=eval_dataset,
    max_seq_length=2048,
    packing=True,
    args=TrainingArguments(
        learning_rate=2e-6,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=16,
        lr_scheduler_type="cosine",
        num_train_epochs=1,
        logging_strategy="steps",
        save_strategy="steps",
        evaluation_strategy="steps",
        logging_steps=10,
        eval_steps=10,
        save_steps=10,
        warmup_steps=50,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        weight_decay=0.01,
        save_total_limit=10,
        neftune_noise_alpha=5,
    ),
    callbacks=[
        EarlyStoppingCallback(
            early_stopping_patience=3,
            early_stopping_threshold=0.005
        ),
    ],
)

DPOTrainer(
    model,
    beta=0.1,
    train_dataset=dataset,
    tokenizer=tokenizer,
    eval_dataset=eval_dataset,
    max_length=1536,
    max_prompt_length=1024,
    args=TrainingArguments(
        learning_rate=2e-6,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=1,
        lr_scheduler_type="cosine",
        num_train_epochs=1,
        logging_strategy="steps",
        save_strategy="steps",
        evaluation_strategy="steps",
        logging_steps=1,
        eval_steps=1,
        save_steps=1,
        warmup_steps=0,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        weight_decay=0.0,
        neftune_noise_alpha=5,
        remove_unused_columns=False,
    ),
    callbacks=[
        EarlyStoppingCallback(
            early_stopping_patience=3,
            early_stopping_threshold=0.005
        ),
    ],
)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	19.92
AI2 Reasoning Challenge (25-Shot)	22.70
HellaSwag (10-Shot)	25.60
MMLU (5-Shot)	23.24
TruthfulQA (0-shot)	0.00
Winogrande (5-shot)	47.99
GSM8k (5-shot)	0.00

Downloads last month: 2,470

Safetensors

Model size

30.5M params

Tensor type

F32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.