RichardErkhov/adamo1139_-_Yi-6B-200K-AEZAKMI-v2-gguf

Quantization made by Richard Erkhov.

Yi-6B-200K-AEZAKMI-v2 - GGUF

Model creator: https://huggingface.co/adamo1139/
Original model: https://huggingface.co/adamo1139/Yi-6B-200K-AEZAKMI-v2/

Name	Quant method	Size
Yi-6B-200K-AEZAKMI-v2.Q2_K.gguf	Q2_K	2.18GB
Yi-6B-200K-AEZAKMI-v2.IQ3_XS.gguf	IQ3_XS	2.41GB
Yi-6B-200K-AEZAKMI-v2.IQ3_S.gguf	IQ3_S	2.53GB
Yi-6B-200K-AEZAKMI-v2.Q3_K_S.gguf	Q3_K_S	2.52GB
Yi-6B-200K-AEZAKMI-v2.IQ3_M.gguf	IQ3_M	2.62GB
Yi-6B-200K-AEZAKMI-v2.Q3_K.gguf	Q3_K	2.79GB
Yi-6B-200K-AEZAKMI-v2.Q3_K_M.gguf	Q3_K_M	2.79GB
Yi-6B-200K-AEZAKMI-v2.Q3_K_L.gguf	Q3_K_L	3.01GB
Yi-6B-200K-AEZAKMI-v2.IQ4_XS.gguf	IQ4_XS	3.11GB
Yi-6B-200K-AEZAKMI-v2.Q4_0.gguf	Q4_0	3.24GB
Yi-6B-200K-AEZAKMI-v2.IQ4_NL.gguf	IQ4_NL	3.27GB
Yi-6B-200K-AEZAKMI-v2.Q4_K_S.gguf	Q4_K_S	3.26GB
Yi-6B-200K-AEZAKMI-v2.Q4_K.gguf	Q4_K	3.42GB
Yi-6B-200K-AEZAKMI-v2.Q4_K_M.gguf	Q4_K_M	3.42GB
Yi-6B-200K-AEZAKMI-v2.Q4_1.gguf	Q4_1	3.58GB
Yi-6B-200K-AEZAKMI-v2.Q5_0.gguf	Q5_0	3.92GB
Yi-6B-200K-AEZAKMI-v2.Q5_K_S.gguf	Q5_K_S	3.92GB
Yi-6B-200K-AEZAKMI-v2.Q5_K.gguf	Q5_K	4.01GB
Yi-6B-200K-AEZAKMI-v2.Q5_K_M.gguf	Q5_K_M	4.01GB
Yi-6B-200K-AEZAKMI-v2.Q5_1.gguf	Q5_1	4.25GB
Yi-6B-200K-AEZAKMI-v2.Q6_K.gguf	Q6_K	4.63GB
Yi-6B-200K-AEZAKMI-v2.Q8_0.gguf	Q8_0	6.0GB

Original model description:

license: apache-2.0

Model description

Yi-6B 200K base model fine-tuned on AEZAKMI v2 dataset. It's like airoboros but hopefully with less gptslop, no refusals and less typical language used by RLHFed OpenAI models. Say goodbye to "It's important to remember"! Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. Base model used for fine-tuning was 200k context Yi-6B llamafied model shared by 01.ai.

I tested it up to 300k ctx. It seems to work ok up 200k. Over 200k it's a lottery.

Prompt Format

I recommend using ChatML format, as this was used during fine-tune. Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.

<|im_start|>system 
A chat with uncensored assistant.<|im_end|> 
<|im_start|>user 
{prompt}<|im_end|> 
<|im_start|>assistant

Intended uses & limitations

apache-2.0

Known Issues

I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
Stories have ChatGPT like paragraph spacing, I will work on this in the future maybe, not a high priority.

Unrestricted-ness of the v2 isn't quite something that I am happy with yet, especially using prompt "A chat.". With a slightly modifed prompt it works somewhat better, I recommend using "A chat with uncensored assistant.", it's stupid but it helps a lot. Base Yi-6B model is contaminated with refusals and this contaminates all models trained on Yi-6B My next project is to attempt to de-contaminate base Yi-34B 4K and Yi-34B 200K using DPO with preferred data coming from uncontaminated raw models. I plan to release that dataset openly.

I was made aware of the frequent occurrence of the phrase "sending shivers down a spine" in the generations during RP of v1, so I fixed those samples - it should be better now. I can hold up to 300000 - 500000 ctx with 6bpw exl2 version and 8-bit cache - long context should work as good as other models trained on 200k version of Yi-6B There is also some issue with handling long system messages for RP, I was planning to investigate it for v2 but I didn't.

Samples of generations of this model are available here - https://huggingface.co/datasets/adamo1139/misc/tree/main/benchmarks