View a PDF of the paper titled SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars, by Xiaosheng Zhao and 8 other authors
View PDF
HTML (experimental)
Abstract:In recent years, large language models (LLMs) have transformed natural language understanding through vast datasets and large-scale parameterization. Inspired by this success, we present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis. Stellar spectra, akin to structured language, encode rich physical and chemical information about stars. By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications. As a proof of concept, SpecCLIP involves pre-training on two spectral types–LAMOST low-resolution and Gaia XP–followed by contrastive alignment using the CLIP (Contrastive Language-Image Pre-training) framework, adapted to associate spectra from different instruments. This alignment is complemented by auxiliary decoders that preserve spectrum-specific information and enable translation (prediction) between spectral types, with the former achieved by maximizing mutual information between embeddings and input spectra. The result is a cross-spectrum framework enabling intrinsic calibration and flexible applications across instruments. We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar-parameter estimation and chemical-abundance determination. SpecCLIP also enhances the accuracy and precision of parameter estimates benchmarked against external survey data. Additionally, its similarity search and cross-spectrum prediction capabilities offer potential for anomaly detection. Our results suggest that contrastively trained foundation models enriched with spectrum-aware decoders can advance precision stellar spectroscopy. Our code SpecCLIP is publicly available at this https URL
Submission history
From: Xiaosheng Zhao [view email]
[v1]
Wed, 2 Jul 2025 17:49:52 UTC (19,342 KB)
[v2]
Wed, 23 Jul 2025 17:47:04 UTC (22,725 KB)
[v3]
Wed, 29 Oct 2025 17:57:03 UTC (22,676 KB)
[v4]
Fri, 19 Dec 2025 18:39:57 UTC (22,649 KB)


