Model save

0761718 verified 6 months ago

4.52 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- orpo
	- unsloth
	- generated_from_trainer
	base_model: cognitivecomputations/dolphin-2.9.1-yi-1.5-9b
	model-index:
	- name: Gaston_dolphin-2.9.1-yi-1.5-9b
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/bacoco/Gaston_dolphin-2.9.1-yi-1.5-9b/runs/4d2n86g4)
	# Gaston_dolphin-2.9.1-yi-1.5-9b

	This model is a fine-tuned version of [cognitivecomputations/dolphin-2.9.1-yi-1.5-9b](https://huggingface.co/cognitivecomputations/dolphin-2.9.1-yi-1.5-9b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4290
	- Rewards/chosen: -0.0153
	- Rewards/rejected: -0.2895
	- Rewards/accuracies: 0.9985
	- Rewards/margins: 0.2742
	- Logps/rejected: -2.8952
	- Logps/chosen: -0.1528
	- Logits/rejected: -0.1534
	- Logits/chosen: 0.0002
	- Nll Loss: 0.4278
	- Log Odds Ratio: -0.0124
	- Log Odds Chosen: 4.8981

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \| Nll Loss \| Log Odds Ratio \| Log Odds Chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|:--------:\|:--------------:\|:---------------:\|
	\| 0.5193 \| 0.1005 \| 103 \| 0.5159 \| -0.0187 \| -0.0825 \| 0.9971 \| 0.0638 \| -0.8248 \| -0.1866 \| 0.1547 \| 0.1467 \| 0.5004 \| -0.1555 \| 2.0327 \|
	\| 0.4988 \| 0.2009 \| 206 \| 0.4724 \| -0.0170 \| -0.1413 \| 0.9985 \| 0.1243 \| -1.4130 \| -0.1703 \| 0.0154 \| -0.0134 \| 0.4661 \| -0.0627 \| 3.0432 \|
	\| 0.4375 \| 0.3014 \| 309 \| 0.4577 \| -0.0162 \| -0.1628 \| 0.9985 \| 0.1466 \| -1.6283 \| -0.1622 \| 0.1372 \| 0.1328 \| 0.4530 \| -0.0467 \| 3.3955 \|
	\| 0.4738 \| 0.4019 \| 412 \| 0.4463 \| -0.0160 \| -0.2198 \| 0.9985 \| 0.2038 \| -2.1980 \| -0.1596 \| -0.0220 \| 0.0649 \| 0.4438 \| -0.0250 \| 4.0928 \|
	\| 0.4893 \| 0.5023 \| 515 \| 0.4406 \| -0.0159 \| -0.2499 \| 0.9985 \| 0.2341 \| -2.4993 \| -0.1585 \| -0.0720 \| 0.0474 \| 0.4388 \| -0.0185 \| 4.4389 \|
	\| 0.4565 \| 0.6028 \| 618 \| 0.4357 \| -0.0157 \| -0.3289 \| 0.9985 \| 0.3133 \| -3.2895 \| -0.1566 \| -0.1392 \| 0.0470 \| 0.4347 \| -0.0093 \| 5.2916 \|
	\| 0.4069 \| 0.7032 \| 721 \| 0.4324 \| -0.0154 \| -0.3096 \| 0.9985 \| 0.2942 \| -3.0962 \| -0.1544 \| -0.1833 \| -0.0044 \| 0.4313 \| -0.0107 \| 5.1028 \|
	\| 0.4297 \| 0.8037 \| 824 \| 0.4299 \| -0.0153 \| -0.2854 \| 0.9985 \| 0.2701 \| -2.8536 \| -0.1528 \| -0.1911 \| -0.0397 \| 0.4286 \| -0.0129 \| 4.8536 \|
	\| 0.4437 \| 0.9042 \| 927 \| 0.4290 \| -0.0153 \| -0.2895 \| 0.9985 \| 0.2742 \| -2.8952 \| -0.1528 \| -0.1534 \| 0.0002 \| 0.4278 \| -0.0124 \| 4.8981 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1