akhaliq (AK) – Community Activity

Paper • 2409.12576 • Published about 22 hours ago • 5 •

commented 4 papers about 4 hours ago

commented 4 papers about 5 hours ago

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12568 • Published about 22 hours ago • 13 •

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12431 • Published 1 day ago •

FlexiTex: Enhancing Texture Generation with Visual Guidance

Paper • 2409.12532 • Published about 23 hours ago •

Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

New activity in cerebras/chain-of-thought about 6 hours ago

streaming outputs

#3 opened about 8 hours ago by

New activity in akhaliq/dailypapershackernews about 9 hours ago

Create app.py

5

#2 opened about 11 hours ago by

guy1eyal

New activity in yanze/PuLID-FLUX about 15 hours ago

add developers local gradio demo section

#6 opened about 15 hours ago by

Paper • 2409.12183 • Published 1 day ago • 21 •

commented 4 papers 1 day ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.11901 • Published 2 days ago • 22 •

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.12191 • Published 1 day ago • 47 •

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12139 • Published 1 day ago • 9 •

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

Paper • 2409.11355 • Published 3 days ago • 24 •

commented 8 papers 2 days ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.10819 • Published 3 days ago • 11 •

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.11406 • Published 3 days ago • 19 •

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11402 • Published 3 days ago • 47 •

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11340 • Published 3 days ago • 55 •

OmniGen: Unified Image Generation

5

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Paper • 2409.11211 • Published 3 days ago • 6 •

Paper • 2409.10923 • Published 3 days ago • 10 •

Agile Continuous Jumping in Discontinuous Terrains

Paper • 2409.10568 • Published 6 days ago • 11 •

On the limits of agency in agent-based models

Paper • 2409.11367 • Published 3 days ago • 11 •

New activity in akhaliq/dailypapershackernews 2 days ago

dark mode

#1 opened 2 days ago by

hysts

commented 2 papers 2 days ago

OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.10594 • Published 4 days ago • 23 •

Kolmogorov-Arnold Transformer

Paper • 2409.10173 • Published 4 days ago • 16 •

commented 3 papers 3 days ago

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.08831 • Published 7 days ago • 1 •

Breaking reCAPTCHAv2

Paper • 2409.09214 • Published 6 days ago • 38 •

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

New activity in sambanovasystems/Llama3.1-Instruct-O1 4 days ago

use gr chatbot

#5 opened 4 days ago by

downgrade openai version

#4 opened 4 days ago by

fix gradio demo issue and not use chatbot component

#3 opened 4 days ago by

update for gradio

#2 opened 4 days ago by

use gradio

#1 opened 4 days ago by

Paper • 2409.08947 • Published 7 days ago • 11 •

commented 6 papers 4 days ago

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

Paper • 2409.08857 • Published 7 days ago • 24 •

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08615 • Published 7 days ago • 10 •

DrawingSpinUp: 3D Animation from Single Character Drawings

Paper • 2409.08514 • Published 7 days ago • 5 •

Apollo: Band-sequence Modeling for High-Quality Audio Restoration

Paper • 2409.08513 • Published 7 days ago • 8 •

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08353 • Published 8 days ago • 9 •

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

New activity in cerebras/chain-of-thought 4 days ago

use gradio blocks and chatbot

#2 opened 4 days ago by

fix to get app running

#1 opened 4 days ago by

Paper • 2409.08278 • Published 8 days ago • 10 •

commented 4 papers 7 days ago

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

Paper • 2409.08264 • Published 8 days ago • 39 •

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08239 • Published 8 days ago • 15 •

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08248 • Published 8 days ago • 12 •

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

Paper • 2409.07450 • Published 9 days ago • 10 •

commented 7 papers 8 days ago

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.06762 • Published 10 days ago • 6 •

Generative Hierarchical Materials Search

Paper • 2409.07441 • Published 9 days ago • 8 •

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Paper • 2409.07129 • Published 9 days ago • 7 •

MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Paper • 2409.06765 • Published 10 days ago • 11 •

gsplat: An Open-Source Library for Gaussian Splatting

Paper • 2409.07429 • Published 9 days ago • 25 •

Agent Workflow Memory

Paper • 2409.07452 • Published 9 days ago • 18 •

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.06135 • Published 10 days ago • 14 •

commented 3 papers 9 days ago

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06029 • Published 10 days ago • 19 •

SongCreator: Lyrics-based Universal Song Generation

Paper • 2409.06633 • Published 10 days ago • 14 •

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Paper • 2409.04410 • Published 14 days ago • 23 •

commented 2 papers 11 days ago

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04005 • Published 14 days ago • 16 •

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Paper • 2409.02245 • Published 16 days ago • 9 •

commented a paper 15 days ago

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation