daily-papers - a tyzhu Collection

Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

tyzhu 's Collections

daily-papers

updated 3 days ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16 • 37
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Paper • 2409.11242 • Published Sep 17 • 5
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Paper • 2409.11136 • Published Sep 17 • 21
On the Diagram of Thought

Paper • 2409.10038 • Published Sep 16 • 11
Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3 • 36
Large Language Models as Markov Chains

Paper • 2410.02724 • Published Oct 3 • 31
Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3 • 31
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3 • 12
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

Paper • 2410.02115 • Published Oct 3 • 10
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Paper • 2410.02762 • Published Oct 3 • 9
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

Paper • 2410.01335 • Published Oct 2 • 5
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published Oct 1 • 34
Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2 • 27
Quantifying Generalization Complexity for Large Language Models

Paper • 2410.01769 • Published Oct 2 • 13
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Paper • 2410.01518 • Published Oct 2 • 2
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30 • 53
Hyper-Connections

Paper • 2409.19606 • Published Sep 29 • 19
Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published Sep 21 • 27
LongGenBench: Long-context Generation Benchmark

Paper • 2410.04199 • Published Oct 5 • 17
Erasing Conceptual Knowledge from Language Models

Paper • 2410.02760 • Published Oct 3 • 12
Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 165
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3 • 48
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1 • 143
Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3 • 23
Mentor-KD: Making Small Language Models Better Multi-step Reasoners

Paper • 2410.09037 • Published 30 days ago • 4
Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published 29 days ago • 14
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published 30 days ago • 41
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Paper • 2410.09008 • Published 30 days ago • 16
Mechanistic Permutability: Match Features Across Layers

Paper • 2410.07656 • Published Oct 10 • 16
SimpleStrat: Diversifying Language Model Generation with Stratification

Paper • 2410.09038 • Published 30 days ago • 4
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Paper • 2410.07035 • Published Oct 9 • 16
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published 25 days ago • 13
Exploring Model Kinship for Merging Large Language Models

Paper • 2410.12613 • Published 25 days ago • 19
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free

Paper • 2410.10814 • Published 27 days ago • 48
What Matters in Transformers? Not All Attention is Needed

Paper • 2406.15786 • Published Jun 22 • 27
Vector-ICL: In-context Learning with Continuous Vector Representations

Paper • 2410.05629 • Published Oct 8 • 3
Intriguing Properties of Large Language and Vision Models

Paper • 2410.04751 • Published Oct 7 • 16
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published 20 days ago • 55
Pre-training Distillation for Large Language Models: A Design Space Exploration

Paper • 2410.16215 • Published 20 days ago • 15
In-context learning and Occam's razor

Paper • 2410.14086 • Published 23 days ago • 2
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published 24 days ago • 24
How Do Training Methods Influence the Utilization of Vision Models?

Paper • 2410.14470 • Published 23 days ago • 4
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media

Paper • 2410.12791 • Published 25 days ago • 4
Counting Ability of Large Language Models and Impact of Tokenization

Paper • 2410.19730 • Published 16 days ago • 10
Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Paper • 2410.16090 • Published 20 days ago • 6
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Paper • 2410.19290 • Published 16 days ago • 10
On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published 11 days ago • 15
Toxicity of the Commons: Curating Open-Source Pre-Training Data

Paper • 2410.22587 • Published 12 days ago • 8
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback

Paper • 2410.21242 • Published 13 days ago • 6
Task Vectors are Cross-Modal

Paper • 2410.22330 • Published 12 days ago • 10
RARe: Retrieval Augmented Retrieval with In-Context Examples

Paper • 2410.20088 • Published 15 days ago • 5
LongReward: Improving Long-context Large Language Models with AI Feedback

Paper • 2410.21252 • Published 13 days ago • 16
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published 11 days ago • 20
Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published 10 days ago • 15
Language Models can Self-Lengthen to Generate Long Texts

Paper • 2410.23933 • Published 10 days ago • 15
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published 10 days ago • 57
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Paper • 2411.01106 • Published 8 days ago • 4
Physics in Next-token Prediction

Paper • 2411.00660 • Published 9 days ago • 14
GPT or BERT: why not both?

Paper • 2410.24159 • Published 10 days ago • 11
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

Paper • 2411.00743 • Published 9 days ago • 6

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs