A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

[Submitted on 15 Aug 2025 (v1), last revised 15 Dec 2025 (this version, v2)]

View a PDF of the paper titled From Clicks to Preference: A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System, by Junhao Yin and 4 other authors

View PDF
HTML (experimental)

Abstract:Generative query suggestion using large language models offers a powerful way to enhance conversational systems, but aligning outputs with nuanced user preferences remains a critical challenge. To address this, we introduce a multi-stage framework designed for progressive alignment between the generation policy and user intent. Our pipeline begins with prompt engineering as a cold-start strategy, followed by the Supervised Fine-Tuning stage, in which we introduce a distillation method on click logs to create a robust foundational model. To better model user preferences while capturing their inherent uncertainty, we develop a Gaussian Reward Model (GaRM) that represents user preferences as probability distributions rather than point estimates. Finally, we employ reinforcement learning to align the generation policy with these preferences, guided by a composite reward function that integrates GaRM with auxiliary heuristics to mitigate reward hacking. To maintain training stability, this process is enhanced by a novel out-of-distribution regularization method and a two-stage reward fusion technique. Extensive experiments demonstrate that our framework significantly outperforms baselines on both automatic and human evaluations and yields a 34\% relative increase in user engagement as measured by click-through rate in live A/B tests.

Submission history

From: Haolin Wang [view email]
[v1]
Fri, 15 Aug 2025 10:17:01 UTC (1,797 KB)
[v2]
Mon, 15 Dec 2025 12:51:55 UTC (1,793 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

Manifold-Matching Autoencoders

One Model to Rule Them All? SAP-RPT-1 and the Future of Tabular Foundation Models

Bridging Facts for Cross-Document Reasoning at Index Time

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

Google brings vehicle feeds to Search campaigns

70+ AI art styles to use in your AI prompts

Broccoli Confetti Rice Recipe | Epicurious

SEO Test Shows It’s Trivial To Rank Misinformation On Google

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

A Multi-stage Alignment Framework for Generative Query Suggestion in Conversational System

Submission history

Related Posts

Subscribe to Updates