From Segment Anything to Any Segmentation

[Submitted on 6 Aug 2025 (v1), last revised 28 Jan 2026 (this version, v2)]

View a PDF of the paper titled X-SAM: From Segment Anything to Any Segmentation, by Hao Wang and 8 other authors

View PDF
HTML (experimental)

Abstract:Large Language Models (LLMs) demonstrate strong capabilities in broad knowledge representation, yet they are inherently deficient in pixel-level perceptual understanding. Although the Segment Anything Model (SAM) represents a significant advancement in visual-prompt-driven image segmentation, it exhibits notable limitations in multi-mask prediction and category-specific segmentation tasks, and it cannot integrate all segmentation tasks within a unified model architecture. To address these limitations, we present X-SAM, a streamlined Multimodal Large Language Model (MLLM) framework that extends the segmentation paradigm from \textit{segment anything} to \textit{any segmentation}. Specifically, we introduce a novel unified framework that enables more advanced pixel-level perceptual comprehension for MLLMs. Furthermore, we propose a new segmentation task, termed Visual GrounDed (VGD) segmentation, which segments all instance objects with interactive visual prompts and empowers MLLMs with visual grounded, pixel-wise interpretative capabilities. To enable effective training on diverse data sources, we present a unified training strategy that supports co-training across multiple datasets. Experimental results demonstrate that X-SAM achieves state-of-the-art performance on a wide range of image segmentation benchmarks, highlighting its efficiency for multimodal, pixel-level visual understanding. Code is available at this https URL.

Submission history

From: Hao Wang [view email]
[v1]
Wed, 6 Aug 2025 17:19:10 UTC (10,942 KB)
[v2]
Wed, 28 Jan 2026 15:50:17 UTC (10,438 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

From Segment Anything to Any Segmentation

Escaping the SQL Jungle | Towards Data Science

A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

How to Measure AI Value

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

What Is Buttermilk? How It’s Made and Used

Why your law firm’s best leads don’t convert after research

For Demi Lovato, Learning to Cook Meant Starting to Heal

Adobe to shut down Marketo Engage SEO tool

23 Radish Recipes for Salads, Pickles, and More

Bots could overtake human web usage by 2027

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

From Segment Anything to Any Segmentation

Submission history

Related Posts

Subscribe to Updates