PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Real-DRL: Teach and Learn at Runtime

Yanbing Mao^1,*, Yihao Cai^1,*, Lui Sha²

¹Wayne State University
²University of Illinois at Urbana-Champaign

The 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

^*Indicates Equal Contribution
Email Correspondence: hm9062@wayne.edu; yihao.cai@wayne.edu

Paper Slides Code

Abstract

The Real-DRL framework is designed for safety-critical autonomous systems, enabling runtime learning of a deep reinforcement learning (DRL) agent to develop safe and high-performance action policies in real plants (i.e., real physical systems to be controlled), while prioritizing safety! The Real-DRL consists of three interactive components: a DRL-Student, a PHY-Teacher, and a Trigger. The DRL-Student is a DRL agent that innovates in the dual self-learning and teaching-to-learn paradigm and the real-time safety-informed batch sampling. On the other hand, PHY-Teacher is a physics-model-based design of action policies that focuses solely on safety-critical functions. PHY-Teacher is novel in its real-time patch for two key missions: i). fostering the teaching-to-learn paradigm for DRL-Student and ii). backing up the safety of real plants. The Trigger manages the interaction between the DRL-Student and the PHY-Teacher. Powered by the three interactive components, the Real-DRL can effectively address safety challenges that arise from the unknown unknowns and the Sim2Real gap. Additionally, Real-DRL notably features i). assured safety, ii). automatic hierarchy learning (i.e., safety-first learning and then high-performance learning), and iii). safety-informed batch sampling to address the learning experience imbalance caused by corner cases. Experiments with a real quadruped robot, a quadruped robot in NVIDIA Isaac Gym, and a cart-pole system, along with comparisons and ablation studies, demonstrate the Real-DRL's effectiveness and unique features.

Experiment-I: Cart-Pole

Three Notable Features of Real-DRL:

Feature I – Teaching-to-Learn Mechanism
Feature II – Safety-informed Batch Sampling
Feature III – Automatic Hierarchy Learning

Experiment-II: Robot Go2 in Wild

Left: PHY-Teacher Robustness Comparison (Real-time vs Static Model) Right: Safety Guarantee of PHY-Teacher in Runtime Learning

Real-DRL vs. State-of-the-Art Safe DRL Approaches in Wild Environments

Experiment-III: Robot A1 in Real-World

Sim2Real Comparison (Real-DRL vs. Phy-DRL vs. Continual Phy-DRL)

Real-DRL's Fault Tolerance of Unknown Unknowns (left) and Safety-first Learning in Real-World Environment(right)

Poster

BibTeX

@inproceedings{
      2025realdrl,
      title={Real-{DRL}: Teach and Learn in Reality},
      author={Yanbing Mao, Yihao Cai, Lui Sha},
      booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
      year={2025},
      url={https://openreview.net/forum?id=gXZlZAeqay}
}