Details about GPTQ, GGUF, HF, GGML, AWQ, fp16, Uncensored of each
Can you tell me the different of this all type of model :
GPTQ, GGUF, HF, GGML, AWQ, fp16
Uncensored GPTQ, Uncensored GGUF, Uncensored HF, Uncensored fp16, Uncensored AWQ, Uncensored GGML
And which one is better And fast Uncensored one or without Uncensored one and which model is fast and good on Google colab T4 GPU
Uncensored just means that it will try to answer questions and not refuse nsfw questions. It’s not a format
Gguf is a format for llama cpp best for cpu or Mac
Gptq with exllama will be fastest format and
Awq should be Hughes quality
Fp16 is the original model unquantized
Ggml is an old format similar to gguf
Hf is a format designed for huggingface transformers which is usually fp16
Thank you for your Response and explanation @johnwick123forevr