View a PDF of the paper titled Binary Verification for Zero-Shot Vision, by Rongbin Hu and Jeffrey Liu
View PDF
HTML (experimental)
Abstract:We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate the proposed REC workflow into a real-world video processing and editing system, and present the system architecture and end-to-end pipeline in the paper. Together, these components yield a simple and unified workflow that emphasizes inference-time design over task-specific training. It offers a practical, drop-in path to stronger zero-shot vision with today’s VLMs.
Submission history
From: Jeffrey Liu [view email]
[v1]
Fri, 14 Nov 2025 06:05:43 UTC (1,872 KB)
[v2]
Fri, 27 Mar 2026 04:08:31 UTC (3,934 KB)


![[2511.10983] Binary Verification for Zero-Shot Vision Measuring Intelligence Efficiency of Local AI](https://skytik.cc/wp-content/uploads/2025/11/Measuring-Intelligence-Efficiency-of-Local-AI-768x448.png)