Qwen
/

Qwen2-57B-A14B-Instruct-GPTQ-Int4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

hzhwcmhf commited on Jun 7

Commit

5cd8160

•

1 Parent(s): 98aef76

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -73,7 +73,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ### Processing Long Texts
-To handle extensive inputs exceeding 65,536 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:

 ### Processing Long Texts
+To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
 For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps: