paisleypark
's Collections
Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning
Paper
•
2312.06134
•
Published
•
2
Efficient Monotonic Multihead Attention
Paper
•
2312.04515
•
Published
•
6
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
Exploring Format Consistency for Instruction Tuning
Paper
•
2307.15504
•
Published
•
7
Learning Universal Predictors
Paper
•
2401.14953
•
Published
•
19
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
•
2401.15077
•
Published
•
19
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
69
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
Modalities
Paper
•
2401.14405
•
Published
•
11
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
•
2401.14404
•
Published
•
17
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
•
2401.10891
•
Published
•
59
Time is Encoded in the Weights of Finetuned Language Models
Paper
•
2312.13401
•
Published
•
19
Unsupervised Universal Image Segmentation
Paper
•
2312.17243
•
Published
•
19
Reasons to Reject? Aligning Language Models with Judgments
Paper
•
2312.14591
•
Published
•
17
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper
•
2312.13314
•
Published
•
7
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
12
In-Context Learning Creates Task Vectors
Paper
•
2310.15916
•
Published
•
41
Controlled Decoding from Language Models
Paper
•
2310.17022
•
Published
•
14
CapsFusion: Rethinking Image-Text Data at Scale
Paper
•
2310.20550
•
Published
•
25
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Paper
•
2311.02262
•
Published
•
10
Memory Augmented Language Models through Mixture of Word Experts
Paper
•
2311.10768
•
Published
•
16
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial
Understanding
Paper
•
2310.15308
•
Published
•
22
An Image is Worth Multiple Words: Learning Object Level Concepts using
Multi-Concept Prompt Learning
Paper
•
2310.12274
•
Published
•
11
Language Modeling Is Compression
Paper
•
2309.10668
•
Published
•
82
Finite Scalar Quantization: VQ-VAE Made Simple
Paper
•
2309.15505
•
Published
•
21
Vision Transformers Need Registers
Paper
•
2309.16588
•
Published
•
77
Paper
•
2309.03179
•
Published
•
29
Gated recurrent neural networks discover attention
Paper
•
2309.01775
•
Published
•
7
One Wide Feedforward is All You Need
Paper
•
2309.01826
•
Published
•
31
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Paper
•
2307.04767
•
Published
•
21
Scaling MLPs: A Tale of Inductive Bias
Paper
•
2306.13575
•
Published
•
14
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Paper
•
2307.02321
•
Published
•
7
CRAG -- Comprehensive RAG Benchmark
Paper
•
2406.04744
•
Published
•
41