Prompt Template with Langchain
I'm trying to make an LLM-RAG system using Langchain and ChromaDB by imitating the given prompt template with this model, but the output is gibberish. Here's how I define the model, tokenizer, ChromaDB, and the prompt template:
# Load Model
model_id = "/home/model/SeaLLM-7B-v2.5/"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')
# ChromaDB
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
prompt_template = """
<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan. Anda diberikan
konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}<eos>
<|im_start|>assistant
ANSWER:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
# ['<bos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '.', '▁Anda', '▁diberikan', '\n', 'kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']
I suspect there's something wrong with my prompt template because I use Langchain but I can't find what. Any help is really appreciated. Thanks for your hard work.
@Hanifahreza
There should be no \n
at the beginning, but I dont think that is an issue.
Can you craft your full langchain prompt into a complete prompt and run the model with model.generate(**inputs, do_sample=True, temperature=0.7)
to see if it works normally?
Note that if you've set repetition penalty, you must set it to 1
Ok, so I have tried to craft the langchain prompt by eliminating the '\n' after the token like this:
prompt_template = """<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia.
Anda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
#['<bos>', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '▁yang', '▁harus', '▁di', 'jawab', '▁dalam', '▁Bahasa', '▁Indonesia', '.', '▁', '\n', 'Anda', '▁diberikan', '▁kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']
then, I input a dummy context and question that is obvious to the prompt and fed it to the model directly like this:
inputs = {
"context": 'net sales apple adalah 3 juta rupiah',
"question": 'berapa net sales apple?'
}
full_prompt = prompt_template.format(**inputs)
generated_output = model.generate(input_ids=tokenizer.encode(full_prompt, return_tensors="pt"), max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(generated_output[0], skip_special_tokens=True))
The result of that print is:
'<|im_start|>system\nAnda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. \nAnda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:\nCONTEXT: net sales apple adalah 3 juta rupiah\n<|im_start|>user\nQUESTION: berapa net sales apple?\nANSWER: Net sales Apple adalah 3 juta rupiah.'
It seems like the model does indeed work. It provides the correct result in the ANSWER. After some investigations, I think I found the culprit behind the gibberish here:
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
retriever = db.as_retriever()
memory = ConversationBufferWindowMemory(
memory_key="chat_history", k=4,
return_messages=True, input_key='question', output_key='answer')
qa = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory,
combine_docs_chain_kwargs={"prompt": prompt},
return_generated_question=True
)
question = "berapa net sales Apple?"
bot_result = qa({"question": question})
print(bot_result['generated_question'])
# 128011280112801128011280112801128011280112801128011280…
print(bot_result['answer'])
# 128011280112801128011280112801128011280112801128011280…
So I guess there's something wrong when the question is generated from the prompt template after the context and question is passed to it, but I don't understand what.
@Hanifahreza
I remembered this case. When you pass in llm=llm
, it doesn't follow the chat format, but directly inject the prompt/instruction as pure text, which cause the model fails to follow the instruction. You need to figure it out
do you find solution for this problem? because i have same issue
do you find solution for this problem? because i have same issue
Unfortunately not.