ojasvisingh786 (Ojasvi Singh Yadav)

upvoted 4 papers 2 days ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 3 days ago • 24

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 3 days ago • 55

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published 3 days ago • 11

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published 4 days ago • 23

upvoted a collection 2 days ago

MagpieLM

Collection

Aligning LMs with Fully Open Pipeline • 7 items • Updated 2 days ago • 8

upvoted 3 papers 4 days ago

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

Paper • 2409.08947 • Published 7 days ago • 11

DrawingSpinUp: 3D Animation from Single Character Drawings

Paper • 2409.08615 • Published 7 days ago • 10

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published 7 days ago • 24

upvoted a collection 7 days ago

Core ML Segment Anything 2

Collection

4 items • Updated 7 days ago • 13

upvoted a paper 7 days ago

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

Paper • 2409.08270 • Published 8 days ago • 8

upvoted a collection 7 days ago

Flux LoRAs

Collection

The flux_Xscape collection • 4 items • Updated 8 days ago • 1

upvoted a paper 7 days ago

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

Paper • 2409.08278 • Published 8 days ago • 10

upvoted a paper 15 days ago

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published 24 days ago • 27

upvoted 4 papers 16 days ago

upvoted a collection 20 days ago

Sapiens

Collection

Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens • 72 items • Updated 1 day ago • 22

upvoted a paper 23 days ago

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 24 days ago • 119

upvoted 2 papers 24 days ago

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Paper • 2408.14211 • Published 25 days ago • 8

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published 28 days ago • 21

upvoted a paper 25 days ago

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

Paper • 2408.13252 • Published 28 days ago • 23

upvoted 2 papers 28 days ago

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published 29 days ago • 84

Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published 29 days ago • 13

upvoted 2 collections 29 days ago

Enhance Your Images

Collection

Papers I want to read

Collection

Papers in my to-read list • 201 items • Updated 3 days ago • 18

upvoted a paper about 1 month ago

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 20

upvoted a collection about 1 month ago

Gradio Spaces for Background Removal

Collection

Enhance your images by removing the background. Will ensure these Spaces are up and maintained for the community. • 5 items • Updated about 1 month ago • 23

upvoted 13 papers about 1 month ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

TraDiffusion: Trajectory-Based Training-Free Image Generation

Paper • 2408.09739 • Published Aug 19 • 7

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Paper • 2408.10198 • Published Aug 19 • 32

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Paper • 2408.09702 • Published Aug 19 • 9

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 37

TurboEdit: Instant text-based image editing

Paper • 2408.08332 • Published Aug 14 • 17

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 152

Fast Sprite Decomposition from Animated Graphics

Paper • 2408.03923 • Published Aug 7 • 7

upvoted a collection about 2 months ago

AuraFlow

Collection

AuraFlow v0.x series, to date the largest (6.8B) and highest fidelity (0.7+ on GenEval) open sourced text to image model. • 3 items • Updated 14 days ago • 4

upvoted 8 papers about 2 months ago

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29 • 37

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 33

Very Large-Scale Multi-Agent Simulation in AgentScope

Paper • 2407.17789 • Published Jul 25 • 30

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published Jul 10 • 16

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 38

A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data

Paper • 2407.16680 • Published Jul 23 • 11

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

Paper • 2407.15187 • Published Jul 21 • 10

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 38

upvoted 2 papers 2 months ago

SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

Paper • 2407.14257 • Published Jul 19 • 5

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Paper • 2407.13976 • Published Jul 19 • 5

upvoted an article 2 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 63

upvoted 4 papers 2 months ago

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15 • 23

Video Occupancy Models

Paper • 2407.09533 • Published Jun 25 • 6

Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11 • 11

upvoted a collection 2 months ago

LLaVA-Next-Interleave

Collection

7 items • Updated Aug 6 • 15

upvoted 2 papers 2 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 64

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10 • 40

Ojasvi Singh Yadav

AI & ML interests

Organizations

ojasvisingh786's activity

Docmatix - a huge dataset for Document Visual Question Answering