Please open mouth kiss the homies.
#1
by
snombler
- opened
llama.cpp's tokenization handling in the past two months is perhaps equally criminal
Not wrong! But until someone else wants to support split loading, it's all we've really got, sadly. Also, thanks for all your contributions.
tbh exl2 simply produces better outputs.
I am graciously willing to accept 3090s to run exl2s for anyone who has them to spare. I'll need enough to run at least 64k context.
Only see 2 bit exl but 4KM gguf. We got different definitions of "before"
It's just proof that bullying works.