Qwen2-VL-72B-Instruct-AWQ / generation_config.json
可亲
fix(pad zero) pad intermediate_size to 29696 to make sure quantized model can use 8 tensor-parallel in vllm
688e28d
raw
history blame
227 Bytes
{
"chat_format": "chatml",
"do_sample": true,
"eos_token_id": 151643,
"max_new_tokens": 512,
"max_window_size": 6144,
"pad_token_id": 151643,
"top_k": 0,
"top_p": 0.01,
"transformers_version": "4.45.0.dev0"
}