Applied Machine Learning Papers

VikramSingh178 's Collections

Dataset Papers

updated 5 days ago

Reading List (Mainly Focused of VLM's and Diffusion Models)

Upvote

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 16
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 12
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 7
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 3
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Paper • 2112.10741 • Published Dec 20, 2021 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Paper • 2307.01952 • Published Jul 4, 2023 • 82
Photorealistic Video Generation with Diffusion Models

Paper • 2312.06662 • Published Dec 11, 2023 • 23
PonderNet: Learning to Ponder

Paper • 2107.05407 • Published Jul 12, 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Paper • 2106.10270 • Published Jun 18, 2021 • 2
Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Paper • 2403.07500 • Published Mar 12
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

Paper • 2305.14720 • Published May 24, 2023 • 2
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27 • 10
Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published May 30 • 32
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 6
Revisiting Feature Prediction for Learning Visual Representations from Video

Paper • 2404.08471 • Published Feb 15 • 1
Guiding Instruction-based Image Editing via Multimodal Large Language Models

Paper • 2309.17102 • Published Sep 29, 2023 • 3
SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21 • 27
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4 • 15
I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 15
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
Graph Neural Networks Gone Hogwild

Paper • 2407.00494 • Published Jun 29
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Paper • 2211.14758 • Published Nov 27, 2022 • 1
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18 • 19
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Paper • 2310.11513 • Published Oct 17, 2023 • 1
InstructVideo: Instructing Video Diffusion Models with Human Feedback

Paper • 2312.12490 • Published Dec 19, 2023 • 17
Semi-Parametric Neural Image Synthesis

Paper • 2204.11824 • Published Apr 25, 2022 • 1
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 52
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

Paper • 2310.19512 • Published Oct 30, 2023 • 15
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Paper • 2310.12190 • Published Oct 18, 2023 • 10
PALP: Prompt Aligned Personalization of Text-to-Image Models

Paper • 2401.06105 • Published Jan 11 • 46
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23 • 21
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6 • 21
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Paper • 2312.10300 • Published Dec 16, 2023 • 1
Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published Sep 20 • 12
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Paper • 2410.10629 • Published 27 days ago • 3
Large Language Models Reflect the Ideology of their Creators

Paper • 2410.18417 • Published 17 days ago
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Paper • 2410.19355 • Published 16 days ago • 20
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published 6 days ago • 27

Upvote