FlexiTex: Enhancing Texture Generation with Visual Guidance
Abstract
Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing global textural or shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically global consistency. Benefiting from the visual guidance, FlexiTex produces quantitatively and qualitatively sound results, demonstrating its potential to advance texture generation for real-world applications.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling (2024)
- MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement (2024)
- DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing (2024)
- Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE (2024)
- Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
I strongly suspect that this method can't perform robustly across a diverse range of mesh categories!
Generates high-fidelity textures of both rich details and multi-view consistency, within 30 inference steps, without fine-tuning, sounds too good to be true from my own experience...
Can we have source code or at least a demo space?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper