Collections
Discover the best community collections!
Collections including paper arxiv:2310.08491
-
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 29 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Humans or LLMs as the Judge? A Study on Judgement Biases
Paper • 2402.10669 • Published -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32 -
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 29 -
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Paper • 2303.16634 • Published • 3 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 24 -
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 38 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 35
-
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper • 2306.05685 • Published • 29 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 35 -
Leveraging Large Language Models for NLG Evaluation: A Survey
Paper • 2401.07103 • Published • 4 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53 -
prometheus-eval/Feedback-Collection
Viewer • Updated • 100k • 662 • 107 -
prometheus-eval/prometheus-7b-v1.0
Text2Text Generation • Updated • 184 • 30 -
prometheus-eval/prometheus-13b-v1.0
Text2Text Generation • Updated • 3.89k • 126
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 32 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 53 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Calibrating LLM-Based Evaluator
Paper • 2309.13308 • Published • 11