Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»SEO & Marketing»Google’s New User Intent Extraction Method
    SEO & Marketing

    Google’s New User Intent Extraction Method

    AwaisBy AwaisJanuary 27, 2026No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Google’s New User Intent Extraction Method
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Google published a research paper on how to extract user intent from user interactions that can then be used for autonomous agents. The method they discovered uses on-device small models that do not need to send data back to Google, which means that a user’s privacy is protected.

    The researchers discovered they were able to solve the problem by splitting it into two tasks. Their solution worked so well it was able to beat the base performance of multi-modal large language models (MLLMs) in massive data centers.

    Smaller Models On Browsers And Devices

    The focus of the research is on identifying the user intent through the series of actions that a user takes on their mobile device or browser while also keeping that information on the device so that no information is sent back to Google. That means the processing must happen on the device.

    They accomplished this in two stages.

    1. The first stage the model on the device summarizes what the user was doing.
    2. The sequence of summaries are then sent to a second model that identifies the user intent.

    The researchers explained:

    “…our two-stage approach demonstrates superior performance compared to both smaller models and a state-of-the-art large MLLM, independent of dataset and model type.
    Our approach also naturally handles scenarios with noisy data that traditional supervised fine-tuning methods struggle with.”

    Intent Extraction From UI Interactions

    Intent extraction from screenshots and text descriptions of user interactions was a technique that was proposed in 2025 using Multimodal Large Language Models (MLLMs). The researchers say they followed this approach to their problem but using an improved prompt.

    The researchers explained that extracting intent is not a trivial problem to solve and that there are multiple errors that can happen along the steps. The researchers use the word trajectory to describe a user journey within a mobile or web application, represented as a sequence of interactions.

    The user journey (trajectory) is turned into a formula where each interaction step consists of two parts:

    1. An Observation
      This is the visual state of the screen (screenshot) of where the user is at that step.
    2. An Action
      The specific action that the user performed on that screen (like clicking a button, typing text, or clicking a link).

    They described three qualities of a good extracted intent:

    • “faithful: only describes things that actually occur in the trajectory;
    • comprehensive: provides all of the information about the user intent required to re-enact the trajectory;
    • and relevant: does not contain extraneous information beyond what is needed for comprehensiveness.”

    Challenging To Evaluate Extracted Intents

    The researchers explain that grading extracted intent is difficult because user intents contain complex details (like dates or transaction data) and the user intents are inherently subjective, containing ambiguities, which is a hard problem to solve. The reason trajectories are subjective is because the underlying motivations are ambiguous.

    For example, did a user choose a product because of the price or the features? The actions are visible but the motivations are not. Previous research shows that intents between humans matched 80% on web trajectories and 76% on mobile trajectories, so it’s not like a given trajectory can always indicate a specific intent.

    Two-Stage Approach

    After ruling out other methods like Chain of Thought (CoT) reasoning (because small language models struggled with the reasoning), they chose a two-stage approach that emulated Chain of Thought reasoning.

    The researchers explained their two-stage approach:

    “First, we use prompting to generate a summary for each interaction (consisting of a visual screenshot and textual action representation) in a trajectory. This stage is
    prompt-based as there is currently no training data available with summary labels for individual interactions.

    Second, we feed all of the interaction-level summaries into a second stage model to generate an overall intent description. We apply fine-tuning in the second stage…”

    The First Stage: Screenshot Summary

    The first summary, for the screenshot of the interaction, they divide the summary into two parts, but there is also a third part.

    1. A description of what’s on the screen.
    2. A description of the user’s action.

    The third component (speculative intent) is a way to get rid of speculation about the user’s intent, where the model is basically guessing at what’s going on. This third part is labeled “speculative intent” and they actually just get rid of it. Surprisingly, allowing the model to speculate and then getting rid of that speculation leads to a higher quality result.

    The researchers cycled through multiple prompting strategies and this was the one that worked the best.

    The Second Stage: Generating Overall Intent Description

    For the second stage, the researchers fine tuned a model for generating an overall intent description. They fine tuned the model with training data that is made up of two parts:

    1. Summaries that represent all interactions in the trajectory
    2. The matching ground truth that describes the overall intent for each of the trajectories.

    The model initially tended to hallucinate because the first part (input summaries) are potentially incomplete, while the “target intents” are complete. That caused the model to learn to fill in the missing parts in order to make the input summaries match the target intents.

    They solved this problem by “refining” the target intents by removing details that aren’t reflected in the input summaries. This trained the model to infer the intents based only on the inputs.

    The researchers compared four different approaches and settled on this approach because it performed so well.

    Ethical Considerations And Limitations

    The research paper ends by summarizing potential ethical issues where an autonomous agent might take actions that are not in the user’s interest and stressed the necessity to build the proper guardrails.

    The authors also acknowledged limitations in the research that might limit generalizability of the results. For example, the testing was done only on Android and web environments, which means that the results might not generalize to Apple devices. Another limitation is that the research was limited to users in the United States in the English language.

    There is nothing in the research paper or the accompanying blog post that suggests that these processes for extracting user intent are currently in use. The blog post ends by communicating that the described approach is helpful:

    “Ultimately, as models improve in performance and mobile devices acquire more processing power, we hope that on-device intent understanding can become a building block for many assistive features on mobile devices going forward.”

    Takeaways

    Neither the blog post about this research or the research paper itself describe the results of these processes as something that might be used in AI search or classic search. It does mention the context of autonomous agents.

    The research paper explicitly mentions the context of an autonomous agent on the device that is observing how the user is interacting with a user interface and then be able to infer what the goal (the intent) of those actions are.

    The paper lists two specific applications for this technology:

    1. Proactive Assistance:
      An agent that watches what a user is doing for “enhanced personalization” and “improved work efficiency”.
    2. Personalized Memory
      The process enables a device to “remember” past activities as an intent for later.

    Shows The Direction Google Is Heading In

    While this might not be used right away, it shows the direction that Google is heading, where small models on a device will be watching user interactions and sometimes stepping in to assist users based on their intent. Intent here is used in the sense of understanding what a user is trying to do.

    Read Google’s blog post here:

    Small models, big results: Achieving superior intent extraction through decomposition

    Read the PDF research paper:

    Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition (PDF)

    Featured Image by Shutterstock/ViDI Studio

    Extraction Googles intent Method user
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

    March 17, 2026

    Trust Is The New Ranking Factor

    March 17, 2026

    What incrementality really means in affiliate marketing

    March 17, 2026

    3 CMS Platforms Control 73% Of The Market & Shape Technical SEO Defaults

    March 17, 2026

    Google tests “Sponsored Shops” blocks in Shopping results

    March 16, 2026

    AI Search Barely Cites Syndicated News Or Press Releases

    March 16, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

    March 17, 2026

    LinkedIn is launching a new AI-powered feed ranking system that uses large language models and…

    Trust Is The New Ranking Factor

    March 17, 2026

    CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

    March 17, 2026

    What They Mean and How to Use Them in Social Media Campaigns

    March 17, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    3 CMS Platforms Control 73% Of The Market & Shape Technical SEO Defaults

    March 17, 2026

    Top 7 Traackr Alternatives 2026

    March 17, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.