-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 23 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2309.10537
-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 58 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 26 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 57
-
Augmenting text for spoken language understanding with Large Language Models
Paper • 2309.09390 • Published • 2 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 82 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 82 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 6 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 9 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 9 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8
-
Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 40 -
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation
Paper • 2308.03793 • Published • 10 -
From Sparse to Soft Mixtures of Experts
Paper • 2308.00951 • Published • 20 -
Revisiting DETR Pre-training for Object Detection
Paper • 2308.01300 • Published • 9
-
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper • 2309.07749 • Published • 7 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 25 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 52 -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper • 2309.06895 • Published • 27
-
Natural Language Supervision for General-Purpose Audio Representations
Paper • 2309.05767 • Published • 9 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 25 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
Toward Joint Language Modeling for Speech Units and Text
Paper • 2310.08715 • Published • 7
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 53 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19