-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18
Collections
Discover the best community collections!
Collections including paper arxiv:2402.19173
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 65
-
cognitivecomputations/dolphin-2.6-mistral-7b-dpo-laser
Text Generation • Updated • 1.18k • 117 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 6 -
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
code2seq: Generating Sequences from Structured Representations of Code
Paper • 1808.01400 • Published • 2
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 9 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 82 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 10
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper • 2401.03065 • Published • 11 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 47 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 49 -
On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Paper • 2312.01639 • Published • 1
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Paper • 2401.04658 • Published • 25 -
Weaver: Foundation Models for Creative Writing
Paper • 2401.17268 • Published • 43 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 16
-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14
-
LLM-Assisted Code Cleaning For Training Accurate Code Generators
Paper • 2311.14904 • Published • 4 -
The Program Testing Ability of Large Language Models for Code
Paper • 2310.05727 • Published • 1 -
Neural Rankers for Code Generation via Inter-Cluster Modeling
Paper • 2311.03366 • Published • 1 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 79
-
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 79 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136 -
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Paper • 2305.01210 • Published • 4 -
NeuRI: Diversifying DNN Generation via Inductive Rule Inference
Paper • 2302.02261 • Published • 3