jondurbin
/

bagel-dpo-2.8b-v0.2

Inference Endpoints

Model card Files Files and versions Community

jondurbin commited on Jan 2

Commit

f17450f

•

1 Parent(s): eed9dfd

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -47,6 +47,46 @@ You are a helpful, unbiased, uncensored assistant.
 Supports several prompt formats, but you can also use `tokenizer.apply_chat_template`
 ## SFT data sources
 *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*

 Supports several prompt formats, but you can also use `tokenizer.apply_chat_template`
+This model did surprisingly well on MT-Bench, for a 2.8b that was only pre-trained on the slimpajama dataset!
+```text
+########## First turn ##########
+                            score
+model               turn
+bagel-dpo-2.8b-v0.2 1     5.10625
+########## Second turn ##########
+                           score
+model               turn
+bagel-dpo-2.8b-v0.2 2     4.7375
+########## Average ##########
+                        score
+model
+bagel-dpo-2.8b-v0.2  4.921875
+```
+## Example chat script
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+device = "cuda"
+tokenizer = AutoTokenizer.from_pretrained("bagel-final-2.8b-v0.2")
+model = MambaLMHeadModel.from_pretrained("bagel-final-2.8b-v0.2", device="cuda", dtype=torch.float32)
+messages = [{"role": "system", "content": "You are a helpful, unbiased, uncensored assistant."}]
+while True:
+    user_message = input("[INST] ")
+    messages.append({"role": "user", "content": user_message})
+    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
+    out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id, repetition_penalty=1.07)
+    decoded = tokenizer.batch_decode(out)[0].split("[/INST]")[-1].replace("</s>", "").strip()
+    messages.append({"role": "assistant", "content": decoded})
+    print("[/INST]", decoded)
+```
 ## SFT data sources
 *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*