Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper
•
2404.02258
•
Published
•
104
Note This paper shows that LLMs should not be used as libraries! They are not knowledge engines, but rather (feeble) reasoning engines.