Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

[Submitted on 3 Feb 2026 (v1), last revised 12 Feb 2026 (this version, v2)]

View a PDF of the paper titled When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making, by Shutong Fan and 2 other authors

View PDF
HTML (experimental)

Abstract:Most adversarial threats in artificial intelligence target the computational behavior of models rather than the humans who rely on them. Yet modern AI systems increasingly operate within human decision loops, where users interpret and act on model recommendations. Large Language Models generate fluent natural-language explanations that shape how users perceive and trust AI outputs, revealing a new attack surface at the cognitive layer: the communication channel between AI and its users. We introduce adversarial explanation attacks (AEAs), where an attacker manipulates the framing of LLM-generated explanations to modulate human trust in incorrect outputs. We formalize this behavioral threat through the trust miscalibration gap, a metric that captures the difference in human trust between correct and incorrect outputs under adversarial explanations. By incorporating this gap, AEAs explore the daunting threats in which persuasive explanations reinforce users’ trust in incorrect predictions. To characterize this threat, we conducted a controlled experiment (n = 205), systematically varying four dimensions of explanation framing: reasoning mode, evidence type, communication style, and presentation format. Our findings show that users report nearly identical trust for adversarial and benign explanations, with adversarial explanations preserving the vast majority of benign trust despite being incorrect. The most vulnerable cases arise when AEAs closely resemble expert communication, combining authoritative evidence, neutral tone, and domain-appropriate reasoning. Vulnerability is highest on hard tasks, in fact-driven domains, and among participants who are less formally educated, younger, or highly trusting of AI. This is the first systematic security study that treats explanations as an adversarial cognitive channel and quantifies their impact on human trust in AI-assisted decision making.

Submission history

From: Shutong Fan [view email]
[v1]
Tue, 3 Feb 2026 20:42:44 UTC (475 KB)
[v2]
Thu, 12 Feb 2026 14:52:34 UTC (475 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

Trust Is The New Ranking Factor

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Follow the AI Footpaths | Towards Data Science

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Hallucinations in LLMs Are Not a Bug in the Data

Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

Trust Is The New Ranking Factor

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

What They Mean and How to Use Them in Social Media Campaigns

3 CMS Platforms Control 73% Of The Market & Shape Technical SEO Defaults

Top 7 Traackr Alternatives 2026

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

Submission history

Related Posts

Subscribe to Updates