Good job
I am very grateful to you for continuing to create such models. They actually produce the answer I would like to receive. Although I still haven't found the settings that suit me, the answers are still good.
Although there is a problem with the fact that at some point the answers are given illogically or some kind of random set of characters is given. But I’ll say right away that I use my quantized imatrix models based on yours. This probably happens after overcoming the threshold of 8192 tokens. Maybe the fact is that this model is not entirely suitable for large contexts like 12288?
Thank you, SolidSnacke. Do other 8B models fare well after going over 8K context?
Unfortunately, I didn’t check the others, only this model Q4_K_M and Q5_K_M. I do not rule out the fact that perhaps I did something wrong when I created them. Because when you released the v1 model, I also made several imatrix models, but they were broken. To create imatrix models I used this file:
imatrix-with-rp-ex.txt - https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/imatrix
I see. Yeah, 8B generally doesn't perform well past 8K context.
I'll remember. Thanks for the answer.
I do not know anything about this, but will koboldcpp's feature (NTK-Aware scaling) work to increase the context size? https://github.com/LostRuins/koboldcpp/wiki
@lightning-missile It seems like it should adjust automatically. But perhaps, in the case of this model, it is worth manually adjusting the parameters. Although it seems to me that I screwed up somewhere, I still got this result with my imatrix models, and not with the gguf models from the author.