[2511.03601] Step-Audio-EditX Technical Report

[Submitted on 5 Nov 2025 (v1), last revised 19 Nov 2025 (this version, v2)]

Authors:Chao Yan, Boyong Wu, Peng Yang, Pengfei Tan, Guoqiang Hu, Li Xie, Yuxin Zhang, Xiangyu (Tony)Zhang, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Shuchang Zhou, Gang Yu

View a PDF of the paper titled Step-Audio-EditX Technical Report, by Chao Yan and 13 other authors

View PDF
HTML (experimental)

Abstract:We present Step-Audio-EditX, the first open-source LLM-based audio model excelling at expressive and iterative audio editing encompassing emotion, speaking style, and paralinguistics alongside robust zero-shot text-to-speech (TTS) capabilities. Our core innovation lies in leveraging only large-margin synthetic data, which circumvents the need for embedding-based priors or auxiliary modules. This large-margin learning approach enables both iterative control and high expressivity across voices, and represents a fundamental pivot from the conventional focus on representation-level disentanglement. Evaluation results demonstrate that Step-Audio-EditX surpasses both MiniMax-2.6-hd and Doubao-Seed-TTS-2.0 in emotion editing and other fine-grained control tasks.

Submission history

From: Boyong Wu [view email]
[v1]
Wed, 5 Nov 2025 16:22:19 UTC (1,174 KB)
[v2]
Wed, 19 Nov 2025 04:56:09 UTC (1,172 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

[2511.03601] Step-Audio-EditX Technical Report

Hierarchical Reinforcement Learning for Large-Scale Adaptive Traffic Signal Control

Why Technical Expertise Alone Won’t Cut It Anymore

[2603.19461] Hyperagents

[2603.04803] Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation

Building a Navier-Stokes Solver in Python from Scratch: Simulating Airflow

Prompt Caching with the OpenAI API: A Full Hands-On Python tutorial

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

10 Vibrant New Recipes to Lure You Out of Hibernation

Lindy review: Is it worth it? [2026]

Chickpea Tachin With Herb Salad Recipe

Google ads are showing identical website stats across multiple advertisers

Green Eggs and Ham Frittata Recipe

Asparagus and Marinated Artichoke Galette Recipe

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

[2511.03601] Step-Audio-EditX Technical Report

Submission history

Related Posts

Subscribe to Updates