response generation too slow
#9
by
hussainwali1
- opened
is there any way to speed up the generation? and it keeps on generating
This is an unquantised model so it does require a lot of VRAM and does a lot of calculations.
If you have an NVidia GPU you could use a quantised model like https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ . That should run faster and need less VRAM.
How are you running the model?