A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

[Submitted on 6 Jan 2026 (v1), last revised 17 Mar 2026 (this version, v3)]

View a PDF of the paper titled From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text, by Shinwoo Park and Yo-Sub Han

View PDF
HTML (experimental)

Abstract:Distinguishing human-written Korean text from fluent LLM outputs remains difficult even for trained readers, who can over-trust surface well-formedness. We present LREAD, a Korean-specific instantiation of a rubric-based expert-calibration framework for human attribution of LLM-generated text. In a three-phase blind longitudinal study with three linguistically trained annotators, Phase 1 measures intuition-only attribution, Phase 2 introduces criterion-anchored scoring with explicit justifications, and Phase 3 evaluates a limited held-out elementary-persona subset. Majority-vote accuracy improves from 0.60 in Phase 1 to 0.90 in Phase 2, and reaches 10/10 on the limited Phase 3 subset (95% CI [0.692, 1.000]); agreement also increases from Fleiss’ $\kappa$ = -0.09 to 0.82. Error analysis suggests that calibration primarily reduces false negatives on AI essays rather than inducing generalized over-detection. We position LREAD as pilot evidence for within-panel calibration in a Korean argumentative-essay setting. These findings suggest that rubric-scaffolded human judgment can complement automated detectors by making attribution reasoning explicit, auditable, and adaptable. The rubric developed in this study, along with the dataset employed for the analysis, is available at this https URL.

Submission history

From: Shinwoo Park [view email]
[v1]
Tue, 6 Jan 2026 10:51:39 UTC (161 KB)
[v2]
Mon, 16 Mar 2026 16:26:06 UTC (164 KB)
[v3]
Tue, 17 Mar 2026 06:02:55 UTC (164 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

Manifold-Matching Autoencoders

One Model to Rule Them All? SAP-RPT-1 and the Future of Tabular Foundation Models

Bridging Facts for Cross-Document Reasoning at Index Time

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

How a Neural Network Learned Its Own Fraud Rules: A Neuro-Symbolic AI Experiment

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Cacio e Pepe (Classic Roman Cheese and Pepper Pasta) Recipe

How To Build An SEO Commissioning Workflow

A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Two-Stage Hurdle Models: Predicting Zero-Inflated Outcomes

One Model to Rule Them All? SAP-RPT-1 and the Future of Tabular Foundation Models

Why customer personas help you win earlier in AI search

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Submission history

Related Posts

Subscribe to Updates