Adding Evaluation Results

85cf9c4 verified 3 months ago

5.52 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	- axolotl
	base_model: Qwen/Qwen2-7B
	datasets:
	- cognitivecomputations/Dolphin-2.9
	- teknium/OpenHermes-2.5
	- m-a-p/CodeFeedback-Filtered-Instruction
	- cognitivecomputations/dolphin-coder
	- cognitivecomputations/samantha-data
	- microsoft/orca-math-word-problems-200k
	- Locutusque/function-calling-chatml
	- internlm/Agent-FLAN
	model-index:
	- name: dolphin-2.9.2-qwen2-7b
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 35.35
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 27.91
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 11.56
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 5.37
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 11.66
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 33.9
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=cognitivecomputations/dolphin-2.9.2-qwen2-7b
	name: Open LLM Leaderboard
	---

	# Dolphin 2.9.2 Qwen2 7B 🐬

	Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations

	[![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
	Discord: https://discord.gg/cognitivecomputations

	<img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />

	Our appreciation for the sponsors of Dolphin 2.9.2:
	- [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node

	This model is based on Qwen2-7b, and is governed by [tongyi-qianwen license](LICENSE)

	The base model has 128k context, and the full-weight fine-tuning was with 16k sequence length.


	example:

	```
	<\|im_start\|>system
	You are Dolphin, a helpful AI assistant.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant

	```

	Dolphin-2.9.2 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.

	Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.

	Dolphin is licensed according to Qwen's tongyi-qianwen license. We grant permission for any use, including commercial, that falls within accordance with said license. Dolphin was trained on data generated from GPT4, among other models.

	## Evals

	![image/png](https://i.ibb.co/0Qw3XtM/file-Oq9-Wr-Qx-H2-Wr8-Eb-Gs15z-Rv-TGe.png)
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_cognitivecomputations__dolphin-2.9.2-qwen2-7b)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|20.96\|
	\|IFEval (0-Shot) \|35.35\|
	\|BBH (3-Shot) \|27.91\|
	\|MATH Lvl 5 (4-Shot)\|11.56\|
	\|GPQA (0-shot) \| 5.37\|
	\|MuSR (0-shot) \|11.66\|
	\|MMLU-PRO (5-shot) \|33.90\|