๐ DPR-KO
1. Intro
ํ๊ตญ์ด DPR ๋ชจ๋ธ (Question Encoder) ์
๋๋ค.
Facebook์ DPR ์ฝ๋์๋ ์ ํ ๋ค๋ฅธ ์๋ก์ด ์ฝ๋๋ก ํ์ต๋์์ต๋๋ค.
Dense Vector ๊ธฐ๋ฐ์ Semantic Search์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
์ง๋ฌธ์ Question Encoder๋ก, ํ
์คํธ๋ Context Encoder๋ฅผ ์ด์ฉํด ์ธ์ฝ๋ฉํฉ๋๋ค.
- Github: https://github.com/snumin44/DPR-KO
- Original Code: https://github.com/facebookresearch/DPR/tree/main
- Context Encoder: https://huggingface.co/snumin44/biencoder-ko-bert-context
2. Experiment settings
- ๋ฒ ์ด์ค ๋ชจ๋ธ: klue/bert-base
- ๋ฐ์ดํฐ ์ : KorQuad v1
- ์ํค ๋คํ: kowiki-latest-pages-articles.xml.bz2 (2024/07/23)
- ์ฒญํฌ ๋น ๋ฌธ์ฅ: 5
- ์ ์ฒด ์ฒญํฌ: ์ฝ 160 ๋ง
- BM25 ๊ฐ์ค์น: 0.3
- 1 A100 GPU
3. Performance
(%) | BM25 (w/o DPR-KO) | DPR-KO (w/o BM25) | DPR-KO (with BM25) |
---|---|---|---|
Top1 Acc | 36.25 | 48.98 | 71.16 |
Top5 Acc | 51.61 | 71.16 | 86.75 |
Top10 Acc | 57.34 | 77.05 | 90.28 |
Top20 Acc | 62.40 | 82.09 | 92.66 |
Top50 Acc | 68.46 | 87.03 | 94.86 |
Top100 Acc | 72.48 | 90.23 | 96.02 |
โป BM25๋ชจ๋ธ์ ํ๊ตญ์ด ์ํคํผ๋์ ์ ์ฒด ํ
์คํธ๋ก ํ์ตํ ๋ชจ๋ธ์
๋๋ค.
โป ์์ธํ ํ์ต ๋ฐ ํ๊ฐ ๋ฐฉ์์ Github๋ฅผ ์ฐธ๊ณ ํด์ฃผ์ธ์.
Citing
@article{lim2019korquad1,
title={Korquad1. 0: Korean qa dataset for machine reading comprehension},
author={Lim, Seungyoung and Kim, Myungji and Lee, Jooyoul},
journal={arXiv preprint arXiv:1909.07005},
year={2019}
}
@article{karpukhin2020dense,
title={Dense Passage Retrieval for Open-Domain Question Answering},
author={Vladimir Karpukhin, Barlas Oฤuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih},
journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2020}
}
@misc{park2021klue,
title={KLUE: Korean Language Understanding Evaluation},
author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
year={2021},
eprint={2105.09680},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 422