4k versions load and work in Koboldcpp, but the 128k versions don't.
The 4k versions load and work in Koboldcpp, but the 128k versions don't load, for some reason.
It isn't supported yet, it was literally introduced to lcpp an hour ago
https://github.com/ggerganov/llama.cpp/commit/201cc11afa0a1950e1f632390b2ac6c937a0d8f0
Oh, I see. Thank you. I thought the 4k and 128k versions had been released simultaneously hence I was surprised that only the former works with Koboldcpp.
Former is using the initial 4k support from Phi 3 mini I believe.
i got it sorted out - lower your gpu layrs to 0 and try to raise till faliure , someting overloading my ram , i presume its beacuse of igpu with nvidia gpu i didnt suffer from that on my second laptop , windose is f showing i got 8gb vram of igpu when i really only got like half gig at max
after lowring gpu offlading to zero i got it working on my end
This 128k version can be loaded using llamafile v0.8.6. I believe that in terms of reasoning it is the best LLM model tested so far.