[2510.21332] Weak-to-Strong Generalization under Distribution Shifts

[Submitted on 24 Oct 2025 (v1), last revised 25 Nov 2025 (this version, v2)]

View a PDF of the paper titled Weak-to-Strong Generalization under Distribution Shifts, by Myeongho Jeon and 3 other authors

View PDF

Abstract:As future superhuman models become increasingly complex, accurately supervising their behavior may exceed human capabilities. Recent works have demonstrated that in such scenarios, weak models can effectively supervise strong models, a phenomenon known as weak-to-strong generalization. However, we find that naive weak-to-strong generalization fails under distribution shifts, often leading to worse performance of the strong model than its weak supervisors. To address this, we propose RAVEN, a robust weak-to-strong generalization framework that dynamically learns the optimal combinations of weak models in addition to parameters of the strong model. We demonstrate the effectiveness of RAVEN on image classification, text classification, and preference alignment tasks. RAVEN outperforms alternative baselines by over 30% on out-of-distribution tasks while matching or surpassing existing methods on in-distribution tasks. Moreover, our results show that RAVEN assigns higher weights to more accurate weak models, demonstrating its ability to automatically identify trustworthy supervision.

Submission history

From: Jan Sobotka [view email]
[v1]
Fri, 24 Oct 2025 10:46:50 UTC (2,041 KB)
[v2]
Tue, 25 Nov 2025 21:37:10 UTC (2,033 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

[2510.21332] Weak-to-Strong Generalization under Distribution Shifts

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Hallucinations in LLMs Are Not a Bug in the Data

Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

How to Build a Production-Ready Claude Code Skill

Interactive Robot Skill Adaptation using Natural Language

Bayesian Thinking for People Who Hated Statistics

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

3 CMS Platforms Control 73% Of The Market & Shape Technical SEO Defaults

Top 7 Traackr Alternatives 2026

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Get threat intelligence to your team fast, in the tools they already use

Google tests “Sponsored Shops” blocks in Shopping results

AI Search Barely Cites Syndicated News Or Press Releases

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

[2510.21332] Weak-to-Strong Generalization under Distribution Shifts

Submission history

Related Posts

Subscribe to Updates