Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merve 
posted an update 19 days ago
Post
3414
Microsoft released a groundbreaking model that can be used for web automation, with MIT license 🔥 microsoft/OmniParser

Interesting highlight for me was Mind2Web (a benchmark for web navigation) capabilities of the model, which unlocks agentic behavior for RPA agents.

no need for hefty web automation pipelines that get broken when the website/app design changes! Amazing work.

Lastly, the authors also fine-tune this model on open-set detection for interactable regions and see if they can use it as a plug-in for VLMs and it actually outperforms off-the-shelf open-set detectors like GroundingDINO. 👏


OmniParser is a state-of-the-art UI parsing/understanding model that outperforms GPT4V in parsing.
In this post