Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments

[Submitted on 31 Dec 2025 (v1), last revised 6 Feb 2026 (this version, v3)]

View a PDF of the paper titled DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments, by Yohan Park and 3 other authors

View PDF
HTML (experimental)

Abstract:Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments–a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkEQA, an open-source benchmark for evaluating EQA-relevant perceptual primitives under multi-level low-light conditions. DarkEQA isolates the perception bottleneck by evaluating question answering from egocentric observations under controlled degradations, enabling attributable robustness analysis. A key design feature of DarkEQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline. We demonstrate the utility of DarkEQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models. Our analysis systematically reveals VLMs’ limitations when operating under these challenging visual conditions. Project website: this https URL

Submission history

From: Yohan Park [view email]
[v1]
Wed, 31 Dec 2025 17:31:29 UTC (4,600 KB)
[v2]
Tue, 6 Jan 2026 05:24:09 UTC (4,599 KB)
[v3]
Fri, 6 Feb 2026 15:25:39 UTC (4,599 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Follow the AI Footpaths | Towards Data Science

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Hallucinations in LLMs Are Not a Bug in the Data

Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

How to Build a Production-Ready Claude Code Skill

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

What They Mean and How to Use Them in Social Media Campaigns

Follow the AI Footpaths | Towards Data Science

49 Kitchen Utensil Holders With Strong Aesthetic Opinions

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Get threat intelligence to your team fast, in the tools they already use

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments

Submission history

Related Posts

Subscribe to Updates