[2512.10322] User-Feedback-Driven Adaptation for Vision-and-Language Navigation

[Submitted on 11 Dec 2025 (v1), last revised 4 Feb 2026 (this version, v2)]

View a PDF of the paper titled User-Feedback-Driven Adaptation for Vision-and-Language Navigation, by Yongqiang Yu and 8 other authors

View PDF
HTML (experimental)

Abstract:Real-world deployment of Vision-and-Language Navigation (VLN) agents is constrained by the scarcity of reliable supervision after offline training. While recent adaptation methods attempt to mitigate distribution shifts via environment-driven self-supervision (e.g., entropy minimization), these signals are often noisy and can cause the agent to amplify its own mistakes during long-horizon sequential decision-making. In this paper, we propose a paradigm shift that positions user feedback, specifically episode-level success confirmations and goal-level corrections, as a primary and general-purpose supervision signal for VLN. Unlike internal confidence scores, user feedback is intent-aligned and in-situ consistent, directly correcting the agent’s decoupling from user instructions. To effectively leverage this supervision, we introduce a user-feedback-driven learning framework featuring a topology-aware trajectory construction pipeline. This mechanism lifts sparse, goal-level corrections into dense path-level supervision by generating feasible paths on the agent’s incrementally built topological graph, enabling sample-efficient imitation learning without requiring step-by-step human demonstrations. Furthermore, we develop a persistent memory bank mechanism for warm-start initialization, supporting the reuse of previously acquired topology and cached representations across navigation sessions. Extensive experiments on the GSA-R2R benchmark demonstrate that our approach transforms sparse interaction into robust supervision, consistently outperforming environment-driven baselines while exhibiting strong adaptability across diverse instruction styles.

Submission history

From: Yongqiang Yu [view email]
[v1]
Thu, 11 Dec 2025 06:11:45 UTC (1,682 KB)
[v2]
Wed, 4 Feb 2026 11:58:22 UTC (1,699 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

[2512.10322] User-Feedback-Driven Adaptation for Vision-and-Language Navigation

[2601.15871] Why Inference in Large Models Becomes Decomposable After Training

Self-Hosting Your First LLM | Towards Data Science

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

The State of Social Media 2026

[2601.15871] Why Inference in Large Models Becomes Decomposable After Training

Top Blog Platforms for SEO: Which Sites to Conside

Self-Hosting Your First LLM | Towards Data Science

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Post, Story, and Reels Dimensions

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

[2512.10322] User-Feedback-Driven Adaptation for Vision-and-Language Navigation

Submission history

Related Posts

Subscribe to Updates