gguf / llama.cpp support
Is there a chance you will finish your PR on llama.cpp ? (https://github.com/ggerganov/llama.cpp/pull/7599)
It looks like 90% is done but if a few more months are passing then the PR will become useless, llama.cpp is evolving and changing so any stale PR is destined to be lost.
On the positive side: once it is integrated the developer team will keep it maintained and working.
MiniCPM is such a good model, it would be the strongest visual model on llama.cpp (including ollama and other wrappers) - given the work and effort already spent I really hope to see a completion of the PR so all of that does not go to waste.
Hi, Thank you for your attention, and I hope my reply didn't keep you waiting for too long.
Yes, we will finish it in this week.
We have always aimed to create open-source models that can help more people, and the integration of gguf/llama.cpp is a key part of it. We did notice that the current version merged into the main branch has inconsistencies in the tokenization process compared to the older version in our fork. We have now asked
@tc-mb
to dedicate his full efforts in the coming days to truly complete this PR. We hope you understand that our manpower is limited, and we aim to deliver a PR this week that has undergone comprehensive precision testing and can be directly merged!
That's great to hear! I really hope it works out