Can I specify the number of threads used in CPU reasoning?
#13
by
byzp
- opened
CPU reasoning seems to use half the number of kernel threads by default. Can I improve it to get faster speed?
Of course you can.
Pass parallel_num=your_threads_num
to quantize()
when quantizing.
Or if your model has already been loaded, call quantize()
again, and reset the cpu core number used in quantization:
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).cpu().float()
model = model.quantize(bits=4, parallel_num=your_threads_num)
However, inappropriate parallel_num can harm efficiency, it is not recommended to exceed the number of cores.