Anton Obukhov

toshas

AI & ML interests

None yet

Organizations

toshas's activity

posted an update 5 months ago
view post
Post
954
Join us at our remaining CVPR presentations this week! Members of PRS-ETH will be around to connect with you and discuss our presented and ongoing works:

πŸ’ Marigold: Discover our work on sharp diffusion-based computer vision techniques, presented in Orals 3A track on "3D from Single View", Thu, June 20, 9:00-9:15 AM. Also, drop by Poster Session 3 later that day for more tangible matters! 🌚
Project page: https://marigoldmonodepth.github.io/
Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Collection: https://huggingface.co/collections/prs-eth/marigold-6669e9e3d3ee30f48214b9ba
Space: prs-eth/marigold-lcm
Diffusers 🧨 tutorial: https://huggingface.co/docs/diffusers/using-diffusers/marigold_usage

βš™οΈ Point2CAD: Learn about our mechanical CAD model reconstruction from point clouds, presented in Poster Session 1, Wed, June 19, 10:30 AM - 12:00 PM.
Project page: https://www.obukhov.ai/point2cad.html
Paper: Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds (2312.04962)

🎭 DGInStyle: Explore our generative data synthesis approach as a cost-efficient alternative to real and synthetic data, presented in the Workshop on Synthetic Data for Computer Vision, Tue, June 18, at Summit 423-425.
Details and schedule: https://syndata4cv.github.io/
Project page: https://dginstyle.github.io/
Paper: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control (2312.03048)
Model: yurujaja/DGInStyle
reacted to sayakpaul's post with ❀️πŸ”₯πŸš€ 6 months ago
view post
Post
1845
🧨 Diffusers 0.28.0 is out πŸ”₯

It features the first non-generative pipeline of the library -- Marigold πŸ₯

Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.

This release also features a massive refactor (led by @DN6 ) of the from_single_file() method, highlighting our efforts for making our library more amenable to community features πŸ€—

Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
posted an update 7 months ago
view post
Post
1954
Another gem from our lab β€” DGInStyle! We use Stable Diffusion to generate semantic segmentation data for autonomous driving and train domain-generalizable networks.

πŸ“Ÿ Website: https://dginstyle.github.io
🧾 Paper: https://arxiv.org/abs/2312.03048
πŸ€— Hugging Face Paper: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control (2312.03048)
πŸ€— Hugging Face Model: yurujaja/DGInStyle
πŸ™ Code: https://github.com/yurujaja/DGInStyle

In a nutshell, our pipeline overcomes the resolution loss of Stable Diffusion latent space and the style bias of ControlNet, as shown in the attached figures. This allows us to generate sufficiently high-quality pairs of images and semantic masks to train domain-generalizable semantic segmentation networks.

Team: Yuru Jia ( @yurujaja ), Lukas Hoyer, Shengyu Huang, Tianfu Wang ( @Tianfwang ), Luc Van Gool, Konrad Schindler, and Anton Obukhov ( @toshas ).
reacted to osanseviero's post with πŸ”₯πŸ€—β€οΈ 8 months ago
view post
Post
1615
Diaries of Open Source. Part 10 πŸš€

🌼Marigold-LCM: A super fast SOTA Depth Estimator
Demo: prs-eth/marigold-lcm
Original paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Model: https://hf.co/prs-eth/marigold-lcm-v1-0

🌟Quiet-STaR: A self-teaching technique via internal monologue
Paper: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (2403.09629)
GitHub: https://github.com/ezelikman/quiet-star
Tweetutorial: https://twitter.com/ericzelikman/status/1768663835106513041

πŸ–ΌοΈ WebSight v0.2: A image-to-code dataset containing tailwind CSS, images in screenshots, and more!
Dataset: HuggingFaceM4/WebSight
Paper: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
Blog: https://hf.co/blog/websight

πŸ•΅οΈAgent-FLAN - effective agent tuning for LLMs
Paper: Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881)
Model: internlm/Agent-FLAN-7b
Dataset: internlm/Agent-FLAN
Website: https://internlm.github.io/Agent-FLAN/

πŸ”₯HPT, a family of multimodal LLMs from HyperGAI
Blog post: https://hypergai.com/blog/introducing-hpt-a-family-of-leading-multimodal-llms
Model: HyperGAI/HPT
GitHub: https://github.com/hyperGAI/HPT

🌏Models and datasets around the world
- Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6
- UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI
- CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
Β·
posted an update 8 months ago
view post
Post
1975
Introducing Marigold-LCM 🌼 β€” a FAST version of the now popular state-of-the-art depth estimator! Thanks to the latent consistency distillation, it retains the precision of the original Marigold but reaches the solution in just a few steps!

Check out the teaser video attached below and play with the new demo - it accepts videos now! Also, meet the new team member: Tianfu Wang ( @Tianfwang )

πŸ€— Demo: prs-eth/marigold-lcm
πŸ€— Model: https://huggingface.co/prs-eth/marigold-lcm-v1-0
πŸ€— Original Marigold post: https://huggingface.co/posts/toshas/656973498012745
πŸ€— Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
πŸ‘Ύ Code: https://github.com/prs-eth/marigold
πŸ‘Ύ Code: pip install diffusers
  • 1 reply
Β·
reacted to osanseviero's post with ❀️ 10 months ago
view post
Post
I finished my model merging experiment day.πŸ€—I would love your thoughts on this.

What did I do? I merged Mistral Instruct 0.1 and 0.2 models using different merging techniques:
- SLERP: linear interpolation (most popular method)
- MoE: replace some forward layers with MoE layers; using a random gate for now
- Frankenmerge: also known as passthrough, but that isn't very cool. It concatenates some specified layers ending in different numbers of params. In my case, I went from 7B to 9B.

Note: merging is not building an ensemble of models. You can read more about merging techniques at https://huggingface.co/blog/mlabonne/merge-models

Results
I built the 3 models using mergekit (running in an HF Space) - took less than an hour to do the three) osanseviero/mistral-instruct-merges-659ebf35ca0781acdb86bb0a

I'm doing a quick check with the OpenLLM Leaderboard.
🚨The OpenLLM Leaderboard is more suitable for pre-trained models than instruct models, but I still thought it would be interesting to look at the insights🚨

You can look at the attached image. Some interesting things
- All three models performed somewhere between 0.1 and 0.2 - congrats to the 140 people who got it right in https://twitter.com/osanseviero/status/1745071548866736171
- Frankenmerge terribly sucked with GSM8K. It seems that adding some Mistral 0.1 layers actually degraded the performance a lot - this is worse than even 0.1!
- Otherwise, frankenmerge was decent across HellaSwag, MMLU, and specially TruthfulQA
- MoE is using random gating, so I expected something right in between 0.1 and 0.2, which was the case

What do I do with this?
Not sure tbh! I think doing proper MT bench evals would be nice. I also think all of us should give a nice GH star to mergekit because it's awesome. I would love to have the time to do end-to-end ablation studies, but cool new things are coming up. Let me know if you have any thoughts in the results
Β·
reacted to their post with πŸ€―πŸ€—β€οΈ 11 months ago
view post
Post
Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

πŸ€— Hugging Face Space: https://huggingface.co/spaces/toshas/marigold
πŸ€— Hugging Face Model: https://huggingface.co/Bingxin/Marigold
πŸ€— Hugging Face Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
πŸ‘Ύ Code: https://github.com/prs-eth/marigold
πŸ‘Ύ Code: pip install diffusers (check comments to this post for details!)
πŸ“„ Paper: https://arxiv.org/abs/2312.02145

Brought to you by the fantastic team from the Photogrammetry and Remote Sensing group of ETH Zurich: Bingxin Ke ( @Bingxin ), Anton Obukhov ( @toshas ), Shengyu Huang, Nando Metzger ( @nandometzger ), Rodrigo Caye Daudt, and Konrad Schindler.
Β·
replied to their post 11 months ago
posted an update 11 months ago
view post
Post
Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

πŸ€— Hugging Face Space: https://huggingface.co/spaces/toshas/marigold
πŸ€— Hugging Face Model: https://huggingface.co/Bingxin/Marigold
πŸ€— Hugging Face Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
πŸ‘Ύ Code: https://github.com/prs-eth/marigold
πŸ‘Ύ Code: pip install diffusers (check comments to this post for details!)
πŸ“„ Paper: https://arxiv.org/abs/2312.02145

Brought to you by the fantastic team from the Photogrammetry and Remote Sensing group of ETH Zurich: Bingxin Ke ( @Bingxin ), Anton Obukhov ( @toshas ), Shengyu Huang, Nando Metzger ( @nandometzger ), Rodrigo Caye Daudt, and Konrad Schindler.
Β·