[2511.07904] Test-driven Reinforcement Learning in Continuous Control

[Submitted on 11 Nov 2025 (v1), last revised 9 Dec 2025 (this version, v3)]

View a PDF of the paper titled Test-driven Reinforcement Learning in Continuous Control, by Zhao Yu and 2 other authors

View PDF
HTML (experimental)

Abstract:Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and guiding learning, it is challenging to design the reward function manually, which often results in a suboptimal task representation. To tackle the reward design challenge in RL, inspired by the satisficing theory, we propose a Test-driven Reinforcement Learning (TdRL) framework. In the TdRL framework, multiple test functions are used to represent the task objective rather than a single reward function. Test functions can be categorized as pass-fail tests and indicative tests, each dedicated to defining the optimal objective and guiding the learning process, respectively, thereby making defining tasks easier. Building upon such a task definition, we first prove that if a trajectory return function assigns higher returns to trajectories closer to the optimal trajectory set, maximum entropy policy optimization based on this return function will yield a policy that is closer to the optimal policy set. Then, we introduce a lexicographic heuristic approach to compare the relative distance relationship between trajectories and the optimal trajectory set for learning the trajectory return function. Furthermore, we develop an algorithm implementation of TdRL. Experimental results on the DeepMind Control Suite benchmark demonstrate that TdRL matches or outperforms handcrafted reward methods in policy training, with greater design simplicity and inherent support for multi-objective optimization. We argue that TdRL offers a novel perspective for representing task objectives, which could be helpful in addressing the reward design challenges in RL applications.

Submission history

From: Zhao Yu [view email]
[v1]
Tue, 11 Nov 2025 06:58:52 UTC (9,388 KB)
[v2]
Sat, 15 Nov 2025 04:28:51 UTC (9,230 KB)
[v3]
Tue, 9 Dec 2025 07:05:57 UTC (9,430 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

[2511.07904] Test-driven Reinforcement Learning in Continuous Control

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Follow the AI Footpaths | Towards Data Science

3 CMS Platforms Control 73% Of The Market & Shape Technical SEO Defaults

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Google AI Mode’s Personal Intelligence Now Free In U.S.

YouTube Social Listening 2026 Guide

To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Post, Story, and Reels Dimensions

How to Sell AI Services Without Selling Your Soul : Social Media Examiner

Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

[2511.07904] Test-driven Reinforcement Learning in Continuous Control

Submission history

Related Posts

Subscribe to Updates