Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.04093

Papers - Attention - Topology

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4

This collection is for Transformer Articles

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Paper • 2307.03712 • Published Jul 7, 2023 • 1
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4
Arcee's MergeKit: A Toolkit for Merging Large Language Models

Paper • 2403.13257 • Published Mar 20 • 20
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

Papers - Attention - Tree Attention

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Paper • 2403.09919 • Published Mar 14 • 20
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Paper • 2305.09781 • Published May 16, 2023 • 4
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4

ibm/AttaQ

Viewer • Updated Jan 26 • 1.4k • 883 • 11
snorkelai/snorkel-curated-instruction-tuning

Preview • Updated Mar 11 • 107 • 8
corbyrosset/researchy_questions

Viewer • Updated Feb 29 • 96.4k • 1.39k • 24
argilla/ultrafeedback-binarized-preferences

Viewer • Updated Nov 30, 2023 • 63.6k • 265 • 66

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs