General Speech Restoration with One-step Latent Bridge Models

[Submitted on 28 Sep 2025 (v1), last revised 27 Feb 2026 (this version, v4)]

View a PDF of the paper titled VoiceBridge: General Speech Restoration with One-step Latent Bridge Models, by Chi Zhang and 3 other authors

View PDF
HTML (experimental)

Abstract:Bridge models have been investigated in speech enhancement but are mostly single-task, with constrained general speech restoration (GSR) capability. In this work, we propose VoiceBridge, a one-step latent bridge model (LBM) for GSR, capable of efficiently reconstructing 48 kHz fullband speech from diverse distortions. To inherit the advantages of data-domain bridge models, we design an energy-preserving variational autoencoder, enhancing the waveform-latent space alignment over varying energy levels. By compressing waveform into continuous latent representations, VoiceBridge models~\textit{various} GSR tasks with a~\textit{single} latent-to-latent generative process backed by a scalable transformer. To alleviate the challenge of reconstructing the high-quality target from distinctively different low-quality priors, we propose a joint neural prior for GSR, uniformly reducing the burden of the LBM in diverse tasks. Building upon these designs, we further investigate bridge training objective by jointly tuning LBM, decoder and discriminator together, transforming the model from a denoiser to generator and enabling \textit{one-step GSR without distillation}. Extensive validation across in-domain (\textit{e.g.}, denoising and super-resolution) and out-of-domain tasks (\textit{e.g.}, refining synthesized speech) and datasets demonstrates the superior performance of VoiceBridge. Demos: this https URL.

Submission history

From: Chi Zhang [view email]
[v1]
Sun, 28 Sep 2025 17:12:13 UTC (6,366 KB)
[v2]
Wed, 11 Feb 2026 09:21:01 UTC (6,362 KB)
[v3]
Sun, 15 Feb 2026 03:32:01 UTC (6,362 KB)
[v4]
Fri, 27 Feb 2026 03:19:49 UTC (6,362 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

General Speech Restoration with One-step Latent Bridge Models

Escaping the SQL Jungle | Towards Data Science

A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

How to Measure AI Value

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

What Is Buttermilk? How It’s Made and Used

Why your law firm’s best leads don’t convert after research

For Demi Lovato, Learning to Cook Meant Starting to Heal

Adobe to shut down Marketo Engage SEO tool

23 Radish Recipes for Salads, Pickles, and More

Bots could overtake human web usage by 2027

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

General Speech Restoration with One-step Latent Bridge Models

Submission history

Related Posts

Subscribe to Updates