Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2308.12950

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 22

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 41
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 22
Simple and Controllable Music Generation

Paper • 2306.05284 • Published Jun 8, 2023 • 142
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 3

synthetic code generation

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4 • 15
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Paper • 2308.10462 • Published Aug 21, 2023 • 1

There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 29
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Paper • 2306.08568 • Published Jun 14, 2023 • 28
SantaCoder: don't reach for the stars!

Paper • 2301.03988 • Published Jan 9, 2023 • 7
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25 • 46

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 14

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper • 2311.12198 • Published Nov 20, 2023 • 22
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6
Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 22

Code Llama: Open Foundation Models for Code

Paper • 2308.12950 • Published Aug 24, 2023 • 22

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 8
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 50
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 47
Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 34
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Paper • 2309.11568 • Published Sep 20, 2023 • 10

Creative Robot Tool Use with Large Language Models

Paper • 2310.13065 • Published Oct 19, 2023 • 8
CodeCoT and Beyond: Learning to Program and Test like a Developer

Paper • 2308.08784 • Published Aug 17, 2023 • 5
Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 30
CodePlan: Repository-level Coding using LLMs and Planning

Paper • 2309.12499 • Published Sep 21, 2023 • 73

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs