--- library_name: peft base_model: Qwen/Qwen-7B-Chat datasets: - nlpai-lab/kullm-v2 language: - ko - en - zh --- # Model Card for Model ID Korean Chatbot based on Alibaba's [QWEN](https://github.com/QwenLM/Qwen) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6232fdee38869c4ca8fd49e2/CBQ0cdD54Sd7-rbNt-Mkb.png) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fmcq1YZaIYg-cuCS4aadomutLmzSyEYI#scrollTo=6c1edcdc-158d-4043-a7c7-1d145ebf2cd1) (keep in mind that basic colab runtime with T4 GPU will lead to OOM error. Fine-tuned version of Qwen-14b-Chat-Int4 will not have this issue) ## Model Details ### Model Description This is a instruction-tuned model of [πŸ€—Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat). The instruction data for fine-tuning was [πŸ€—nlpai-lab/kullm-v2](https://huggingface.co/nlpai-lab/kullm-v2). The model is intended to communicate via Korean, although it preserves its high proficiency in English & Chinese. Currently working on fine-tuning [πŸ€—Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat-Int4) via QLoRA, and this will have better results without OOM error in basic colab Runtime, so *STAY TUNED*! μ•Œλ¦¬λ°”λ°”μ˜ QWEN으둜 fine-tuneν•œ AIλΉ„μ„œ Ko-QWEN을 μ†Œκ°œν•©λ‹ˆλ‹€! nlpai-lab/kullm-v2 λ°μ΄ν„°λ‘œ Instruction-tune을 μ§„ν–‰ν–ˆμŠ΅λ‹ˆλ‹€. ν•œκ΅­μ–΄λ‘œ λŒ€ν™”ν•˜λŠ” 데 μ΄ˆμ μ„ λ§žμ·„μ§€λ§Œ, μ˜μ–΄μ™€ 쀑ꡭ어 μ„±λŠ₯도 μ—¬μ „νžˆ ν›Œλ₯­ν•©λ‹ˆλ‹€. ν˜„μž¬ Qwen-14B-Chat-Int4 λͺ¨λΈμ„ QLoRA ν•™μŠ΅μ„ 진행쀑이며, μ΄λŠ” 더 쒋은 μ„±λŠ₯을 보이며 κΈ°λ³Έ colab runtimeμ—μ„œλ„ gpu OOMμ—λŸ¬κ°€ 없을 κ²ƒμœΌλ‘œ μ˜ˆμƒλ©λ‹ˆλ‹€! ### Model Sources - **Paper [optional]:** read the QWEN team's [technical report](https://arxiv.org/abs/2309.16609). - **Demo [optional]:** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fmcq1YZaIYg-cuCS4aadomutLmzSyEYI#scrollTo=6c1edcdc-158d-4043-a7c7-1d145ebf2cd1) ### Model - **Fine-Tuned by:** Jungwon Chang - **Model type:** Chatbot(AI assistant) - **Language(s) (NLP):** Korean, English, Chinese - **License:** Please refer to the original [QWEN license.txt](https://github.com/QwenLM/Qwen/blob/main/LICENSE) - **Finetuned from model [optional]:** [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) ## Uses The Ko-QWEN AI assistant is designed to serve as a versatile tool for individuals and organizations looking to enhance their communication and productivity. The intended uses of the model include: - Language Learning and Practice: Assisting users in learning and practicing Korean, English, and Chinese through natural language conversation. - Cultural Exchange: Facilitating cross-cultural communication and understanding by providing insights into language nuances and cultural context. - Informational Assistance: Offering prompt responses to inquiries across various domains, helping users obtain information efficiently. - Task Automation: Aiding in routine tasks such as scheduling, reminders, and managing emails, thereby improving personal and workplace productivity. - Accessibility Services: Supporting users with disabilities by providing an alternative means of communication and interaction with digital content. - Research and Development: Serving as a platform for researchers in computational linguistics, natural language processing, and AI to conduct experiments and develop new technologies. - Users of this model are encouraged to apply it in ways that foster positive engagement and contribute to the advancement of language technologies and AI. ### Out-of-Scope Use This model is intended for constructive and ethical applications, such as language learning, cultural exchange, and providing assistance with information and tasks within the bounds of lawful behavior. Uses that are explicitly out of scope and strongly discouraged include: - Illegal Activities: Any form of support for illegal activities, including hacking, phishing, or fraud. - Harmful Content Creation: Generating content that is abusive, defamatory, harassing, hateful, obscene, or otherwise intended to harm others. - Misinformation: Disseminating false information or contributing to the spread of rumors. - Impersonation: Attempting to impersonate others or create misleading representations of individuals or entities. - Bias and Discrimination: Using the model to promote bias, discrimination, or unfair treatment of individuals or groups based on race, gender, ethnicity, religion, or any other personal characteristics. - Commercial Use: Utilizing the model for commercial purposes without proper licensing or agreement. ## Bias, Risks, and Limitations This model has been trained on a diverse dataset intended to minimize bias; however, as with any AI system, there is a risk of unintended bias in language generation, which could reinforce stereotypes or marginalize certain groups. Furthermore, while the model is proficient in multiple languages, there may be nuances and idiomatic expressions that it does not fully capture, potentially affecting the quality of interaction in certain cultural contexts. The model's responses are also based on patterns in data and may not always provide accurate or reliable information, especially in fast-evolving or specialized knowledge domains. ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. To address these issues, we recommend the following: - Critical Engagement: Users should critically evaluate the information provided by the model and cross-reference with authoritative sources when necessary. - Diverse Testing: Continuously testing the model with a diverse set of inputs can help identify and mitigate biases. - Contextual Awareness: Users should provide clear context when interacting with the model to improve the relevance and accuracy of its responses. - Regular Updates: Periodic retraining of the model with current data can help maintain its relevance and accuracy over time. - Transparency: Clearly communicating the model's limitations to users can help set appropriate expectations regarding its performance. - Ethical Use Monitoring: Implement mechanisms to monitor and prevent the model's use in unethical or harmful ways. ## How to Get Started with the Model Use the code below to get started with the model, or play around with the model in the [colab demo]((https://colab.research.google.com/drive/1fmcq1YZaIYg-cuCS4aadomutLmzSyEYI#scrollTo=6c1edcdc-158d-4043-a7c7-1d145ebf2cd1)) ```python # How to load model and tokenizer import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" if torch.cuda.is_available() else torch.device("cpu") model_path = "Jungwonchang/Ko-QWEN-7B-Chat-LoRA" config = PeftConfig.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True, resume_download=True) model = PeftModel.from_pretrained(model, model_path).to(device) tokenizer = AutoTokenizer.from_pretrained( config.base_model_name_or_path, trust_remote_code=True, resume_download=True, ) # Chat by simply using the Qwen model's .chat() method. query = "Any question you want to ask" response, _ = model.chat(query) # You can also change your system prompt response, _ = model.chat(query, system="μ•„λž˜λŠ” μž‘μ—…μ„ μ„€λͺ…ν•˜λŠ” λͺ…λ Ήμ–΄μž…λ‹ˆλ‹€. μš”μ²­μ„ 적절히 μ™„λ£Œν•˜λŠ” 응닡을 μž‘μ„±ν•˜μ„Έμš”.") ``` ```python # You can make your custom chat function if you want to play with various hyperparameters. # alpaca style single-turn Chat def qwen_chat_single_turn(model, tokenizer, device, query, system_message="You are a helpful assistant."): # Define special tokens and newline token im_start_id = tokenizer.im_start_id im_end_id = tokenizer.im_end_id nl_token = tokenizer.encode("\n")[0] # Function to tokenize individual parts of the conversation def tokenize_conversation(role, content): return [im_start_id] + tokenizer.encode(role) + [nl_token] + tokenizer.encode(content) + [im_end_id] # Start constructing the full conversation context context_tokens = tokenize_conversation("system", system_message) + [nl_token, nl_token] # Add the current user query context_tokens += tokenize_conversation("user", query) + [nl_token, nl_token] # Add a token indicating the start of the assistant's response context_tokens += [im_start_id] + tokenizer.encode("assistant") + [nl_token] # Convert context tokens to a tensor and generate a response input_ids = torch.tensor(context_tokens).unsqueeze(0).to(device) generated_ids = model.generate( input_ids=input_ids, max_new_tokens=1024, early_stopping=True, do_sample=True, top_k=20, top_p=0.92, no_repeat_ngram_size=3, eos_token_id=im_end_id, repetition_penalty=1.2, num_beams=3 ) # Decode the generated response generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) # Extract only the assistant's response response = generated_text.split("assistant\n")[-1].strip() return response query="Question you want to ask" response = qwen_chat_single_turn(model, tokenizer, device, query=query, system_message="μ•„λž˜λŠ” μž‘μ—…μ„ μ„€λͺ…ν•˜λŠ” λͺ…λ Ήμ–΄μž…λ‹ˆλ‹€. μš”μ²­μ„ 적절히 μ™„λ£Œν•˜λŠ” 응닡을 μž‘μ„±ν•˜μ„Έμš”." ) ``` ## Training Details ### Training Data [πŸ€—nlpai-lab/kullm-v2](https://huggingface.co/nlpai-lab/kullm-v2): 150k amount of Korean instruction dataset ### Training Procedure The model was fine-tuned using LoRA (Low-Rank Adaptation), which allows for efficient training of large language models by updating only a small set of parameters. The fine-tuning process was conducted on a single node with 2 GPUs, utilizing distributed training to enhance the training efficiency and speed. The lora rank was set to 32, for I only had limited time to access the GPUs. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary A instruction fine-tuned(or SFT) version on Alibaba's QWEN/QWEN-7b ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 2 NVIDIA RTX A6000 - **Hours used:** approximately 72 hours - **Cloud Provider:** PaperSpace - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications ### Model Architecture and Objective ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] Jungwon Chang ## Model Card Contact cjw1994cool@gmail.com cjw1994cool@korea.ac.kr ## Training procedure ### Framework versions - PEFT 0.6.1