View a PDF of the paper titled From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text, by Shinwoo Park and Yo-Sub Han
View PDF
HTML (experimental)
Abstract:Distinguishing human-written Korean text from fluent LLM outputs remains difficult even for trained readers, who can over-trust surface well-formedness. We present LREAD, a Korean-specific instantiation of a rubric-based expert-calibration framework for human attribution of LLM-generated text. In a three-phase blind longitudinal study with three linguistically trained annotators, Phase 1 measures intuition-only attribution, Phase 2 introduces criterion-anchored scoring with explicit justifications, and Phase 3 evaluates a limited held-out elementary-persona subset. Majority-vote accuracy improves from 0.60 in Phase 1 to 0.90 in Phase 2, and reaches 10/10 on the limited Phase 3 subset (95% CI [0.692, 1.000]); agreement also increases from Fleiss’ $\kappa$ = -0.09 to 0.82. Error analysis suggests that calibration primarily reduces false negatives on AI essays rather than inducing generalized over-detection. We position LREAD as pilot evidence for within-panel calibration in a Korean argumentative-essay setting. These findings suggest that rubric-scaffolded human judgment can complement automated detectors by making attribution reasoning explicit, auditable, and adaptable. The rubric developed in this study, along with the dataset employed for the analysis, is available at this https URL.
Submission history
From: Shinwoo Park [view email]
[v1]
Tue, 6 Jan 2026 10:51:39 UTC (161 KB)
[v2]
Mon, 16 Mar 2026 16:26:06 UTC (164 KB)
[v3]
Tue, 17 Mar 2026 06:02:55 UTC (164 KB)


