Learning by Stateful Reflective Memory

[Submitted on 27 Dec 2025 (v1), last revised 31 Dec 2025 (this version, v2)]

View a PDF of the paper titled Memento 2: Learning by Stateful Reflective Memory, by Jun Wang

View PDF
HTML (experimental)

Abstract:We study continual learning in large language model (LLM) based agents that integrate episodic memory with reinforcement learning. We focus on reflection, the ability of an agent to revisit past experience and adjust how it selects future actions, as the central mechanism for continual adaptation without fine tuning model weights. To formalise this, we introduce the Stateful Reflective Decision Process (SRDP), in which an agent maintains and updates episodic memory and alternates between writing new experiences to memory and reading relevant cases to guide decisions. This framework casts reflective memory dynamics as part of the decision process itself and makes them amenable to control and learning analysis. Building on this formulation, we develop a Read-Write Reflective Learning algorithm that incorporates memory retrieval into a soft policy iteration procedure and prove that it converges. We further show that as memory grows and more densely covers the task environment, the resulting policy approaches optimality. Our framework unifies memory based reasoning with reinforcement learning and provides a formal foundation for LLM agents capable of continual, experience driven learning.

Submission history

From: Jun Wang [view email]
[v1]
Sat, 27 Dec 2025 22:15:03 UTC (301 KB)
[v2]
Wed, 31 Dec 2025 23:24:05 UTC (150 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Learning by Stateful Reflective Memory

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

How to Measure AI Value

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

The Math That’s Killing Your AI Agent

Agent Control Protocol: Admission Control for Agent Actions

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

How to create a Zoom meeting link and share it

Hilary Duff Is a Diet Coke Truther

Google confirms AI headline rewrites test in Search results

How to add Google Calendar to Outlook

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

9 types of Google Ads (pros, cons, and when to use each)

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Learning by Stateful Reflective Memory

Submission history

Related Posts

Subscribe to Updates