Fix quantization_config to work with vLLM v0.5.3.post1

#11

by davidthomas426 - opened Jul 24

←

Jul 24

The modules_to_not_convert need to be the linear layers to work with vLLM, or they are ignored. Setting them to parent modules does not work.

Also, updated the _name_or_path field to the correct HF model id.

Meta Llama org Jul 24

Thanks LGTM

ArthurZ changed pull request status to merged Jul 24

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment