gary109 (蓋瑞王)

upvoted 2 papers 18 days ago

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published 22 days ago • 44

Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published 23 days ago • 8

upvoted 20 papers about 1 month ago

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Paper • 2408.07089 • Published Aug 9 • 12

Generative Photomontage

Paper • 2408.07116 • Published Aug 13 • 19

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

Paper • 2408.06019 • Published Aug 12 • 12

FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

Paper • 2408.06190 • Published Aug 12 • 17

TacSL: A Library for Visuotactile Sensor Simulation and Learning

Paper • 2408.06506 • Published Aug 12 • 7

MovieSum: An Abstractive Summarization Dataset for Movie Screenplays

Paper • 2408.06281 • Published Aug 12 • 9

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

Paper • 2408.06697 • Published Aug 13 • 13

DC3DO: Diffusion Classifier for 3D Objects

Paper • 2408.06693 • Published Aug 13 • 10

UniT: Unified Tactile Representation for Robot Learning

Paper • 2408.06481 • Published Aug 12 • 9

Layerwise Recurrent Router for Mixture-of-Experts

Paper • 2408.06793 • Published Aug 13 • 30

upvoted an article about 1 month ago

Article

Welcome FalconMamba: The first strong attention-free 7B model

Aug 12

• 96

upvoted 14 papers about 1 month ago

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

OpenResearcher: Unleashing AI for Accelerated Scientific Research

Paper • 2408.06941 • Published Aug 13 • 28

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

Med42-v2: A Suite of Clinical LLMs

Paper • 2408.06142 • Published Aug 12 • 50

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12 • 55

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 114

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Paper • 2408.05147 • Published Aug 9 • 36

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9 • 31

Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

Paper • 2408.03900 • Published Aug 7 • 9

Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation

Paper • 2408.03588 • Published Aug 7 • 6

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5 • 32

upvoted 11 papers about 2 months ago

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Paper • 2408.01291 • Published Aug 2 • 11

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Paper • 2408.00874 • Published Aug 1 • 40

POA: Pre-training Once for Models of All Sizes

Paper • 2408.01031 • Published Aug 2 • 26

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Paper • 2408.01337 • Published Aug 2 • 10

UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Paper • 2408.00762 • Published Aug 1 • 9

Finch: Prompt-guided Key-Value Cache Compression

Paper • 2408.00167 • Published Jul 31 • 13

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

Open-Vocabulary Audio-Visual Semantic Segmentation

Paper • 2407.21721 • Published Jul 31 • 8

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Paper • 2407.20229 • Published Jul 29 • 7

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

upvoted an article about 2 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 58

upvoted 11 papers about 2 months ago

3D Question Answering for City Scene Understanding

Paper • 2407.17398 • Published Jul 24 • 21

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Paper • 2407.19474 • Published Jul 28 • 22

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29 • 37

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47

JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources

Paper • 2407.20750 • Published Jul 30 • 21

Harvesting Textual and Structured Data from the HAL Publication Repository

Paper • 2407.20595 • Published Jul 30 • 21

A Large Encoder-Decoder Family of Foundation Models For Chemical Language

Paper • 2407.20267 • Published Jul 24 • 31

Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings

Paper • 2407.20581 • Published Jul 30 • 23

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

Paper • 2407.20729 • Published Jul 30 • 25

Meltemi: The first open Large Language Model for Greek

Paper • 2407.20743 • Published Jul 30 • 67

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

Paper • 2407.20445 • Published Jul 29 • 20

蓋瑞王

AI & ML interests

Organizations

gary109's activity

Welcome FalconMamba: The first strong attention-free 7B model

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope