Llama-2-ko-DPO-13B
Based on the changed criteria from Open-AI-LLM leaderboard, the evaluation metric exceeded 50 percent for the first time. I am pretty proud of myself, even though this score will soon fade into the background as I'm simply testing a hypothesis rather than competing, and there are a lot of great models coming out of 7B. Since my day job is technical support, not R&D, I could not spend a lot of time on it, so I only processed about 1000 samples and tuned them with DPO (Direct Preference Optimization) to reduce hallucination. The infrastructure was the same as before, using AWS g5.12xlarge, and no additional prompts were given.
I think the potential of the base LLM model is enormous, seeing how much hallucination are reduced with very little data and without much effort. When I meet with customers, many of them have difficulty implementing GenAI features. But it does not take much effort to implement them since many template codes/APIs are well done. It is a world where anyone who is willing to process data can easily and quickly create their own quality model.
Model Details
- Base Model: Llama-2-ko-instruct-13B
Datasets
- 1,000 samples generated by myself
- Sentences generated by Amazon Bedrock Claude-2 were adopted as chosen, and sentences generated by the Llama-2-13B model fine-tuned with SFT were adopted as rejected.
Benchmark
- This is the first Korean LLM model to exceed the average metric of 50 percent.
- SOTA model as of October 31, 2023 (https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
Model | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
---|---|---|---|---|---|---|
daekeun-ml/Llama-2-ko-DPO-13B (Ours) | 51.03 | 47.53 | 58.28 | 43.59 | 51.91 | 53.84 |
daekeun-ml/Llama-2-ko-instruct-13B | 49.52 | 46.5 | 56.9 | 43.76 | 42 | 58.44 |
kyujinpy/Korean-OpenOrca-13B | 48.79 | 43.09 | 54.13 | 40.24 | 45.22 | 61.28 |
License
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License, under LLAMA 2 COMMUNITY LICENSE AGREEMENT
This model was created as a personal experiment, unrelated to the organization I work for.
- Downloads last month
- 4,256