WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published 22 days ago • 44
Can Large Language Models Understand Symbolic Graphics Programs? Paper • 2408.08313 • Published Aug 15 • 6
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning Paper • 2408.08441 • Published Aug 15 • 6
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Paper • 2408.07547 • Published Aug 14 • 7
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space Paper • 2408.07416 • Published Aug 14 • 5
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning Paper • 2408.07089 • Published Aug 9 • 12
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors Paper • 2408.06019 • Published Aug 12 • 12
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework Paper • 2408.06190 • Published Aug 12 • 17
TacSL: A Library for Visuotactile Sensor Simulation and Learning Paper • 2408.06506 • Published Aug 12 • 7
MovieSum: An Abstractive Summarization Dataset for Movie Screenplays Paper • 2408.06281 • Published Aug 12 • 9
SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields Paper • 2408.06697 • Published Aug 13 • 13
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
OpenResearcher: Unleashing AI for Accelerated Scientific Research Paper • 2408.06941 • Published Aug 13 • 28
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 55
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9 • 36
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9 • 31
Kalman-Inspired Feature Propagation for Video Face Super-Resolution Paper • 2408.05205 • Published Aug 9 • 8
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond Paper • 2408.03900 • Published Aug 7 • 9
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation Paper • 2408.03588 • Published Aug 7 • 6
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 32
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling Paper • 2408.01291 • Published Aug 2 • 11
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1 • 40
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models Paper • 2408.01337 • Published Aug 2 • 10
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Paper • 2408.00762 • Published Aug 1 • 9
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Paper • 2407.20229 • Published Jul 29 • 7
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31 • 25
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Paper • 2407.19918 • Published Jul 29 • 47
JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources Paper • 2407.20750 • Published Jul 30 • 21
Harvesting Textual and Structured Data from the HAL Publication Repository Paper • 2407.20595 • Published Jul 30 • 21
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings Paper • 2407.20581 • Published Jul 30 • 23
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework Paper • 2407.20729 • Published Jul 30 • 25
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation Paper • 2407.20445 • Published Jul 29 • 20