[2511.07904] Test-driven Reinforcement Learning in Continuous Control

[Submitted on 11 Nov 2025 (v1), last revised 9 Dec 2025 (this version, v3)]

View a PDF of the paper titled Test-driven Reinforcement Learning in Continuous Control, by Zhao Yu and 2 other authors

View PDF
HTML (experimental)

Abstract:Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and guiding learning, it is challenging to design the reward function manually, which often results in a suboptimal task representation. To tackle the reward design challenge in RL, inspired by the satisficing theory, we propose a Test-driven Reinforcement Learning (TdRL) framework. In the TdRL framework, multiple test functions are used to represent the task objective rather than a single reward function. Test functions can be categorized as pass-fail tests and indicative tests, each dedicated to defining the optimal objective and guiding the learning process, respectively, thereby making defining tasks easier. Building upon such a task definition, we first prove that if a trajectory return function assigns higher returns to trajectories closer to the optimal trajectory set, maximum entropy policy optimization based on this return function will yield a policy that is closer to the optimal policy set. Then, we introduce a lexicographic heuristic approach to compare the relative distance relationship between trajectories and the optimal trajectory set for learning the trajectory return function. Furthermore, we develop an algorithm implementation of TdRL. Experimental results on the DeepMind Control Suite benchmark demonstrate that TdRL matches or outperforms handcrafted reward methods in policy training, with greater design simplicity and inherent support for multi-objective optimization. We argue that TdRL offers a novel perspective for representing task objectives, which could be helpful in addressing the reward design challenges in RL applications.

Submission history

From: Zhao Yu [view email]
[v1]
Tue, 11 Nov 2025 06:58:52 UTC (9,388 KB)
[v2]
Sat, 15 Nov 2025 04:28:51 UTC (9,230 KB)
[v3]
Tue, 9 Dec 2025 07:05:57 UTC (9,430 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

[2511.07904] Test-driven Reinforcement Learning in Continuous Control

For Demi Lovato, Learning to Cook Meant Starting to Heal

Escaping the SQL Jungle | Towards Data Science

A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

How to Measure AI Value

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

What Is Buttermilk? How It’s Made and Used

Why your law firm’s best leads don’t convert after research

For Demi Lovato, Learning to Cook Meant Starting to Heal

Adobe to shut down Marketo Engage SEO tool

23 Radish Recipes for Salads, Pickles, and More

Bots could overtake human web usage by 2027

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

[2511.07904] Test-driven Reinforcement Learning in Continuous Control

Submission history

Related Posts

Subscribe to Updates