Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

[Submitted on 25 Nov 2025 (v1), last revised 1 Apr 2026 (this version, v2)]

View a PDF of the paper titled DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling, by Rui Lin and 6 other authors

View PDF
HTML (experimental)

Abstract:Audio tokenization bridges continuous waveforms and multi-track music language models. In dual-track modeling, tokens should preserve three properties at once: high-fidelity reconstruction, strong predictability under a language model, and cross-track correspondence. We introduce DuoTok, a source-aware dual-track tokenizer that addresses this trade-off through staged disentanglement. DuoTok first pretrains a semantic encoder, then regularizes it with multi-task supervision, freezes the encoder, and applies hard dual-codebook routing while keeping auxiliary objectives on quantized codes. A diffusion decoder reconstructs high-frequency details, allowing tokens to focus on structured information for sequence modeling. On standard benchmarks, DuoTok achieves a favorable predictability-fidelity trade-off, reaching the lowest cnBPT while maintaining competitive reconstruction at 0.75 kbps. Under a held-constant dual-track language modeling protocol, enBPT also improves, indicating gains beyond codebook size effects. Controlled diagnostics show larger predictability costs under cross-track corruption and larger gains from longer context, suggesting that models trained on DuoTok tokens use cross-track structure and non-local history.

Submission history

From: Rui Lin [view email]
[v1]
Tue, 25 Nov 2025 11:53:57 UTC (3,426 KB)
[v2]
Wed, 1 Apr 2026 11:23:39 UTC (909 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

Quantum Simulations with Python | Towards Data Science

[2506.08915] Two-stage Vision Transformers and Hard Masking offer Robust Object Representations

A Benchmark Dataset for Epitope-Specific Antibody Design

Fast Image and Video Editing with Diffusion Guidance

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Gram-Eigenmode INR Editing with Closed-Form Geometry Updates

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

A framework for AI, empathy, and design

Llms.txt Was Step One. Here’s The Architecture That Comes Next

Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

How I used Claude Code to build an influencer ROI dashboard

LinkedIn Is Rewriting the Rules of Visibility : Social Media Examiner

How AI improves email deliverability beyond send times

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

Submission history

Related Posts

Subscribe to Updates