Lyte
/

RWKV-6-World-3B-v2.1-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Edit model card

RWKV-6-World-3B-v2.1-GGUF

This repo contains the RWKV-6-World-3B-v2.1-GGUF NEW (RE)-quantized with the latest llama.cpp b3771.

Note:

The Notebook used to convert this model is included feel free to use to in Colab or Kaggle to quantize future models using it.

How to run the model

Get the latest llama.cpp:

git clone https://github.com/ggerganov/llama.cpp

Download the GGUF file to a new model folder in llama.cpp(choose your quant):

cd llama.cpp
mkdir model
git clone https://huggingface.co/Lyte/RWKV-6-World-3B-v2.1-GGUF
mv RWKV-6-World-3B-v2.1-GGUF/RWKV-6-World-3B-v2.1-GGUF-Q4_K_M.gguf model/
rm -r RWKV-6-World-3B-v2.1-GGUF

For Windows other than git cloning the repo, you just create the "model" folder inside llama.cpp folder and in here click "Files and versions" and download the model quant you want there.
Now to run the model, you can use the following command:

./llama-cli -m ./model/RWKV-6-World-3B-v2.1-GGUF-Q4_K_M.gguf --in-suffix "Assistant:" --interactive-first -c 1024 -t 0.7 --top-k 50 --top-p 0.95 -n 128 -p "Assistant: Hello, what can i help you with today?\nUser:" -r "User:"

Downloads last month: 290

GGUF

Model size

3.1B params

Architecture

rwkv6

4-bit

8-bit

16-bit

Inference Examples

Text Generation

Inference API (serverless) does not yet support gguf models for this pipeline type.

Model tree for Lyte/RWKV-6-World-3B-v2.1-GGUF

Base model

RWKV/rwkv-6-world-3b-v2.1

Quantized

(1)

this model