Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

[Submitted on 10 Feb 2026 (v1), last revised 11 Feb 2026 (this version, v2)]

View a PDF of the paper titled CODE-SHARP: Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs, by Richard Bornemann and 2 other authors

View PDF

Abstract:Developing agents capable of open-endedly discovering and learning novel skills is a grand challenge in Artificial Intelligence. While reinforcement learning offers a powerful framework for training agents to master complex skills, it typically relies on hand-designed reward functions. This is infeasible for open-ended skill discovery, where the set of meaningful skills is not known a priori. While recent methods have shown promising results towards automating reward function design, they remain limited to refining rewards for pre-defined tasks. To address this limitation, we introduce Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs (CODE-SHARP), a novel framework leveraging Foundation Models (FM) to open-endedly expand and refine a hierarchical skill archive, structured as a directed graph of executable reward functions in code. We show that a goal-conditioned agent trained exclusively on the rewards generated by the discovered SHARP skills learns to solve increasingly long-horizon goals in the Craftax environment. When composed by a high-level FM-based planner, the discovered skills enable a single goal-conditioned agent to solve complex, long-horizon tasks, outperforming both pretrained agents and task-specific expert policies by over $134$% on average. We will open-source our code and provide additional videos at this https URL.

Submission history

From: Richard Bornemann [view email]
[v1]
Tue, 10 Feb 2026 18:51:39 UTC (4,271 KB)
[v2]
Wed, 11 Feb 2026 09:46:16 UTC (4,271 KB)

What's Hot

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Follow the AI Footpaths | Towards Data Science

Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Hallucinations in LLMs Are Not a Bug in the Data

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

How nonprofits can build a digital presence that actually drives impact

How Google Profits From Demand You Already Own

Extra-Creamy Deviled Eggs Recipe | Epicurious

How to Sell AI Services Without Selling Your Soul : Social Media Examiner

Generalizing Real-World Robot Manipulation via Generative Visual Transfer

LinkedIn updates feed algorithm with LLM-powered ranking and retrieval

Most Popular

13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

How to watch the 2026 GRAMMY Awards online from anywhere

Corporate Reputation Management Strategies | Sprout Social

Our Picks

At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

Here’s how I turned a Raspberry Pi into an in-car media server

Beloved SF cat’s death fuels Waymo criticism

Subscribe to Updates

What's Hot

Continuous Open-ended Discovery and Evolution of Skills as Hierarchical Reward Programs

Submission history

Related Posts

Subscribe to Updates