Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation
    AI Tools

    Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

    AwaisBy AwaisApril 1, 2026No Comments2 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Measuring Intelligence Efficiency of Local AI
    Share
    Facebook Twitter LinkedIn Pinterest Email

    [Submitted on 27 Mar 2026 (v1), last revised 31 Mar 2026 (this version, v2)]

    View a PDF of the paper titled GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation, by Rui Xie and 5 other authors

    View PDF
    HTML (experimental)

    Abstract:Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias – they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing via Instructional-Video Driven Expertise), a training-free, plug-and-play framework that resolves GUI agent domain bias by autonomously acquiring domain-specific expertise from web tutorial videos through a retrieval-augmented automated annotation pipeline. GUIDE introduces two key innovations. First, a subtitle-driven Video-RAG pipeline unlocks video semantics through subtitle analysis, performing progressive three-stage retrieval – domain classification, topic extraction, and relevance matching – to identify task-relevant tutorial videos. Second, a fully automated annotation pipeline built on an inverse dynamics paradigm feeds consecutive keyframes enhanced with UI element detection into VLMs, inferring the required planning and grounding knowledge that are injected into the agent’s corresponding modules to address both manifestations of domain bias. Extensive experiments on OSWorld demonstrate GUIDE’s generality as a plug-and-play component for both multi-agent systems and single-model agents. It consistently yields over 5% improvements and reduces execution steps – without modifying any model parameters or architecture – validating GUIDE as an architecture-agnostic enhancement to bridge GUI agent domain bias.

    Submission history

    From: Rui Xie [view email]
    [v1]
    Fri, 27 Mar 2026 10:33:08 UTC (4,985 KB)
    [v2]
    Tue, 31 Mar 2026 08:17:51 UTC (4,985 KB)

    agents Annotation Bias domain GUI PlugandPlay realtime Resolving Retrieval Video Web
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    What Happens Now That AI is the First Analyst On Your Team?

    April 1, 2026

    [2510.09858] AI and Consciousness

    April 1, 2026

    Top-p Distributions for Moving Through Time

    April 1, 2026

    A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering

    April 1, 2026

    How to Make Claude Code Better at One-Shotting Implementations

    March 31, 2026

    [2510.05145] Efficient Tree-Structured Deep Research with Adaptive Resource Allocation

    March 31, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

    April 1, 2026

    [Submitted on 27 Mar 2026 (v1), last revised 31 Mar 2026 (this version, v2)] View…

    What is Microsoft Power Automate? [2026]

    April 1, 2026

    Strawberry Shortcake Roll Recipe | Epicurious

    April 1, 2026

    Your next visitor isn’t human

    April 1, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    9 Ways to Boost Yours + Why it Matters

    April 1, 2026

    [2510.09858] AI and Consciousness

    April 1, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.