4k versions load and work in Koboldcpp, but the 128k versions don't.

by YuriGagarine - opened May 21

Discussion

YuriGagarine

May 21

The 4k versions load and work in Koboldcpp, but the 128k versions don't load, for some reason.

Erilaz

May 21

•

edited May 21

It isn't supported yet, it was literally introduced to lcpp an hour ago
https://github.com/ggerganov/llama.cpp/commit/201cc11afa0a1950e1f632390b2ac6c937a0d8f0

YuriGagarine

May 21

Oh, I see. Thank you. I thought the 4k and 128k versions had been released simultaneously hence I was surprised that only the former works with Koboldcpp.

smhf72

May 21

Former is using the initial 4k support from Phi 3 mini I believe.

neo0oen

Jun 2

i got it sorted out - lower your gpu layrs to 0 and try to raise till faliure , someting overloading my ram , i presume its beacuse of igpu with nvidia gpu i didnt suffer from that on my second laptop , windose is f showing i got 8gb vram of igpu when i really only got like half gig at max

after lowring gpu offlading to zero i got it working on my end

florincatalin

Jun 5

•

edited Jun 6

This 128k version can be loaded using llamafile v0.8.6. I believe that in terms of reasoning it is the best LLM model tested so far.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment