Do LLMs Struggle with Math Across Cultural Contexts?

[Submitted on 23 Mar 2025 (v1), last revised 8 Apr 2026 (this version, v2)]

View a PDF of the paper titled Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?, by Aabid Karim and 5 other authors

View PDF
HTML (experimental)

Abstract:We demonstrate that large language models’ (LLMs) mathematical reasoning is culturally sensitive: testing 14 models from Anthropic, OpenAI, Google, Meta, DeepSeek, Mistral, and Microsoft across six culturally adapted variants of the GSM8K benchmark, we find accuracy drops ranging from 0.3% (Claude 3.5 Sonnet) to 5.9% (LLaMA 3.1-8B) when math problems are embedded in unfamiliar cultural contexts–even when the underlying mathematical logic remains unchanged. These statistically significant performance reductions (p < 0.01, confirmed through McNemar tests) reveal that mathematical reasoning in LLMs is not culturally neutral.

To create these variants for Haiti, Moldova, Pakistan, Solomon Islands, Somalia, and Suriname, we systematically replaced cultural entities (names, foods, places, etc.) in 1,198 GSM8K questions while preserving all mathematical operations and numerical values. Our quantitative error analysis of 18,887 instances reveals that cultural adaptation affects broader reasoning patterns, with mathematical reasoning errors comprising 54.7% and calculation errors 34.5% of failures.

Interestingly, cultural familiarity can enhance performance: Mistral Saba outperforms some larger models on Pakistan-adapted problems due to Middle Eastern and South Asian training data exposure. This study underscores the need for more diverse training data to ensure robust LLM performance across global contexts.

Submission history

From: Aabid Karim [view email]
[v1]
Sun, 23 Mar 2025 10:35:39 UTC (3,432 KB)
[v2]
Wed, 8 Apr 2026 07:40:49 UTC (5,078 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Do LLMs Struggle with Math Across Cultural Contexts?

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

The Future of AI for Sales Is Diverse and Distributed

[2604.05070] Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation

How Visual-Language-Action (VLA) Models Work

A Multi-Agent Framework for Automated AI Research Paper Writing

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

How to stay compliant and win in local SEO

How to Make Money on Facebook in 2026

How Hilti Builds Safety Into Every Tool, System, and Solution

What I Learned About The Future Of Search And AI From Sundar Pichai’s Latest Interview

What 400 Sites Reveal About Organic Traffic Gains

How to Use Almond Extract (Without Overdoing It)

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Do LLMs Struggle with Math Across Cultural Contexts?

Submission history

Related Posts

Subscribe to Updates