Model Name: Qwen2 orca_mini_v7_7b
Qwen2 orca_mini_v7_7b is trained with various SFT Datasets
Passionate about Generative AI? I help companies to privately train and deploy custom LLM/MLLM affordably. For startups, I can even assist with securing GPU grants to get you started. Let's chat!https://www.linkedin.com/in/pankajam Looking forward to connecting!
NOTICE
By providing proper credit and attribution, you are granted permission to use this model as a foundational base for further Full fine tuning, DPO, PPO or ORPO tuning and any kind of Merges. I actively encourage users to customize and enhance the model according to their specific needs, as this version is designed to be a comprehensive general model. Dive in and innovate!
Evaluation
Coming Soon..
Example Usage
Here is the ChatML prompt format
<|im_start|>system
You are Orca Mini, a helpful AI assistant.<|im_end|>
<|im_start|>user
Hello Orca Mini, what can you do for me?<|im_end|>
<|im_start|>assistant
Below shows a code example on how to use this model
from transformers import AutoModel, AutoTokenizer
model_slug = "pankajmathur/orca_mini_v7_7b"
model = AutoModel.from_pretrained(model_slug)
tokenizer = AutoTokenizer.from_pretrained(model_slug)
messages = [
{"role": "system", "content": "You are Orca Mini, a helpful AI assistant."},
{"role": "user", "content": "Hello Orca Mini, what can you do for me?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
model.generate(**gen_input)
Quants
GGUF : Coming Soon
AWQ: Coming Soon
Processing Long Texts (Based upon Qwen2-7B-Instruct suggestions at https://huggingface.co/Qwen/Qwen2-7B-Instruct)
To handle extensive inputs exceeding 32,768 tokens, we utilize YARN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
- Install vLLM: You can install vLLM by running the following command.
pip install "vllm>=0.4.3"
Or you can install vLLM from source.
Configure Model Settings: After downloading the model weights, modify the
config.json
file by including the below snippet:{ "architectures": [ "Qwen2ForCausalLM" ], // ... "vocab_size": 152064, // adding the following snippets "rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" } }
This snippet enable YARN to support longer contexts.
Model Deployment: Utilize vLLM to deploy your model. For instance, you can set up an openAI-like server using the command:
python -u -m vllm.entrypoints.openai.api_server --model pankajmathur/orca_mini_v7_7b
Then you can access the Chat API by:
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "pankajmathur/orca_mini_v7_7b", "messages": [ {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."}, {"role": "user", "content": "Hello Orca Mini, what can you do for me?"} ] }'
Note: Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling
configuration only when processing long contexts is required.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 22.41 |
IFEval (0-Shot) | 43.88 |
BBH (3-Shot) | 33.95 |
MATH Lvl 5 (4-Shot) | 2.64 |
GPQA (0-shot) | 6.15 |
MuSR (0-shot) | 12.66 |
MMLU-PRO (5-shot) | 35.19 |
- Downloads last month
- 18
Model tree for pankajmathur/orca_mini_v7_7b
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard43.880
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard33.950
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard2.640
- acc_norm on GPQA (0-shot)Open LLM Leaderboard6.150
- acc_norm on MuSR (0-shot)Open LLM Leaderboard12.660
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard35.190