-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Averaging Weights Leads to Wider Optima and Better Generalization
Paper • 1803.05407 • Published • 2 -
Merging Models with Fisher-Weighted Averaging
Paper • 2111.09832 • Published • 1 -
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2311.03099
-
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6 -
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 6 -
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 13 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 28
-
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Paper • 2310.17653 • Published • 2 -
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 28 -
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming
Paper • 2312.06908 • Published • 5 -
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 75
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 28 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 3 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 16
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 38 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 82 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 82