Can I specify the number of threads used in CPU reasoning?

#13

by byzp - opened Mar 21, 2023

Discussion

byzp

Mar 21, 2023

CPU reasoning seems to use half the number of kernel threads by default. Can I improve it to get faster speed?

DrSong

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Mar 22, 2023

Of course you can.
Pass parallel_num=your_threads_num to quantize() when quantizing.
Or if your model has already been loaded, call quantize() again, and reset the cpu core number used in quantization:

model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).cpu().float()
model = model.quantize(bits=4, parallel_num=your_threads_num)

However, inappropriate parallel_num can harm efficiency, it is not recommended to exceed the number of cores.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment