Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1706.03762

Most influential papers in AI

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 6
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

about 21 hours ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 22
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 7

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15 • 86

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 239

Ilya's papers for Carmack

Ilya Sutskever: "If you really learn all of these, you’ll know 90% of what matters today." Full list: https://punkx.org/jackdoe/30.html

Recurrent Neural Network Regularization

Paper • 1409.2329 • Published Sep 8, 2014
Pointer Networks

Paper • 1506.03134 • Published Jun 9, 2015
Order Matters: Sequence to sequence for sets

Paper • 1511.06391 • Published Nov 19, 2015
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Paper • 1811.06965 • Published Nov 16, 2018

Language model papers

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 29
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 9

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41

Must-read Papers

Some of the most important and insightful (in my opinion) AI papers of today, with a focus on NLP and LLMs.

ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 14
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 103

Papers-Fundamentals

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 59
Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13 • 4

Previous
1
2
3
...
5
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs