-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 38 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 35
Collections
Discover the best community collections!
Collections including paper arxiv:2310.20624
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 80 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 33 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 28 -
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 12
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 12 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
Paper • 2311.00117 • Published -
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
Paper • 2303.08320 • Published • 3
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47
-
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper • 2310.12773 • Published • 28 -
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper • 2311.00059 • Published • 18 -
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 12 -
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8