How to get up to 4096 context length?

by Harm - opened Oct 3, 2023

Discussion

Harm

Oct 3, 2023

•

edited Oct 3, 2023

The Mistral-7B-Instruct-v0.1-GGUF card mentions "The model will work at sequence lengths of 4096, or lower."

But when I import the model it only seems to support max context length of 512.
model._llm.context_length --> 512

When I run a larger prompt I get:
WARNING:ctransformers:Number of tokens (850) exceeded maximum context length (512).

How can I utilize the longer context length for the Mistral-7B-Instruct-v0.1-GGUF model?

YaTharThShaRma999

Oct 3, 2023

You have to manually set it up. It’s normally set up for all models ans 512. Also, it should support around 8k context lenght(slightly lower).

Harm

Oct 3, 2023

You have to manually set it up. It’s normally set up for all models ans 512. Also, it should support around 8k context lenght(slightly lower).

Ok, do you have any suggestions or pointers how to do so?

asfandsaleem

Oct 3, 2023

This comment has been hidden

marekk

Oct 3, 2023

You can use

pip install llama-cpp-python
wget https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGUF/resolve/main/wizardlm-13b-v1.2.Q5_K_M.gguf

And after this for example:

from llama_cpp import Llama
llm = Llama(model_path="wizardlm-13b-v1.2.Q5_K_M.gguf", n_ctx=4096, n_gpu_layers=-1)
print(llm(prompt, max_tokens=1024, temperature=0))

Just change name of model and path.

Harm

Oct 3, 2023

•

edited Oct 3, 2023

Everyone thanks for the suggestions. Was just pointed to the context_length parameter from ctransformers. Context length is upgraded to 4096 by:
from ctransformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
"TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
model_type="mistral",
gpu_layers=50,
hf=True,
context_length=4096)

Harm changed discussion status to closed Oct 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment