responses are incomplete, greetings are not handled

#150

by dev4sidra - opened Aug 2

Aug 2

i have tried each possible way, changed parameters but still responses are incomplete , in some cases it works and of some query it return half ans. Greetings are not handled properly it return un,matched ans

dev4sidra changed discussion status to closed Aug 2

dev4sidra changed discussion status to open Aug 2

pandora-s

Mistral AI_ org Aug 2

Hi dev4sidra, how are you using the model?

DhruvParth

Aug 3

I am facing the same issue as well.

Sample screenshot:

pandora-s

Mistral AI_ org Aug 3

I believe you are using the model as raw text completion and not for chat completion, I would recommend using as mentionned in the readme with:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

DhruvParth

Aug 5

This is how I got it working:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = 'mistralai/Mistral-7B-Instruct-v0.2'

def load_quantized_model(model_name: str):
    """
    :param model_name: Name or path of the model to be loaded.
    :return: Loaded quantized model.
    """
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        quantization_config=bnb_config
    )

    return model

def initialize_tokenizer(model_name: str):
    """
    Initialize the tokenizer with the specified model_name.

    :param model_name: Name or path of the model for tokenizer initialization.
    :return: Initialized tokenizer.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.bos_token_id = 1  # Set beginning of sentence token id
    return tokenizer


model = load_quantized_model(model_name)

tokenizer = initialize_tokenizer(model_name)

# Define stop token ids
stop_token_ids = [0]

def generate_response(prompt):
  text = f"[INST] {prompt} [/INST]"
  encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
  model_input = encoded.to(model.device)
  generated_ids = model.generate(**model_input, max_new_tokens=1000, do_sample=True)
  decoded = tokenizer.batch_decode(generated_ids)
  return decoded[0].replace(text, '').strip()

# https://stackoverflow.com/questions/77803696/runtimeerror-cutlassf-no-kernel-found-to-launch-when-running-huggingface-tran
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)

prompt = "How AI will replace Engineers"
response = generate_response(prompt)
print(response)

dev4sidra

Aug 5

•

edited Aug 5

const response = await textGeneration({
accessToken: apiKey,
model: 'mistralai/Mistral-7B-Instruct-v0.2',
inputs: inputText,
parameters: {
max_length: 1024,
repetition_penalty: 1.03,
temperature: 0.2, /// Adjust for balance between creativity and relevance
top_p: 0.9, // Nucleus sampling: consider top 90% probability mass.
top_k: 50, // Limits token choices to the top 50 most probable tokens.

},
}); i am using it this way, but still responses are incomplete, i tried diff ways to change paramteres,

basically i have a vector db, the question i ask find relevant data from database then i pass the query and relevant searches to model, it should generate full response

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment