A Framework for Scalable Heterogeneous Multi-Agent Adversarial RL in IsaacLab

Training robust, competitive policies across morphology-diverse robot teams.

Isaac Peterson*1, Christopher Allred*1,2, Jacob Morrey1, Mario Harper1

*Equal contribution. 1Utah State University. 2US DEVCOM Army Research Laboratory.

HARL-A heterogeneous adversarial environments teaser (Sumo and mixed-agent teams)

Heterogeneous adversarial multi-agent settings in IsaacLab. Top: quadruped teams in Sumo with Leatherback rovers.

Abstract

MARL is central to robotic systems cooperating in dynamic environments. While prior work has focused on these collaborative settings, adversarial interactions are equally critical for real-world applications such as pursuit-evasion, security, and competitive manipulation. In this work, we extend the IsaacLab framework to support scalable training of adversarial policies in high-fidelity physics simulations. We introduce a suite of adversarial MARL environments featuring heterogeneous agents with asymmetric goals and capabilities. Our platform integrates a competitive variant of HAPPO, enabling efficient training and evaluation under adversarial dynamics. Experiments across several benchmark scenarios demonstrate the framework’s ability to model and train robust policies for morphologically diverse multi-agent competition while maintaining high throughput and simulation realism. Code and benchmarks are available at: https://directlab.github.io/IsaacLab-HARL/.

Highlights

  • Team-specific critics for competitive HARL (HAPPO-style) in IsaacLab.
  • Plug-and-play adversarial environments with curriculum learning.
  • Robust emergent behavior and role specialization across morphologies.

Video

Demonstration video: heterogeneous adversarial multi-agent learning in IsaacLab.

Benchmark Environments

Sumo (Heterogeneous)

Quadrupeds and wheeled rovers compete to force opponents out of a ring. Trained with curriculum: walk → push block → adversarial Sumo.

Sumo heterogeneous teams

Soccer (1v1 / 2v2)

Adversarial ball manipulation with morphology-appropriate actions and team dictionaries; trained with leapfrog actor updates.

Soccer rollouts and goals

3D Galaga: Aerial–Ground Interception

Drones attempt goal hits while MiniTanks ray-cast “laser-tag” knockouts. Demonstrates transfer and emergent competence in adversarial play.

3D Galaga: MiniTanks vs Drones

Environment Videos

Sumo (Leatherback Stage 1)

Soccer (Leatherback Stage 1)

3D Galaga: Aerial–Ground Interception

Results & Findings

Emergent Behaviors

Role specialization emerges: rovers destabilize legs; quadrupeds develop dragging maneuvers.

Emergent strategies: destabilization and dragging

Win-Rates Over Time

Trained policies consistently outperform initial versions across environments; both alternating and simultaneous training are effective.

Win-rate curves across environments

Curriculum & Zero-Buffer

Zero-buffer enables consistent observation spaces across stages. Slightly slower early convergence but smoother stage transitions.

Effect of zero-buffer on curriculum learning

Citation

@inproceedings{peterson2025harlA, title = {A Framework for Scalable Heterogeneous Multi-Agent Adversarial Reinforcement Learning in IsaacLab}, author = {Isaac Peterson and Christopher Allred and Jacob Morrey and Mario Harper}, booktitle = {Proceedings of the IEEE Conference}, year = {2025}, note = {Code and benchmarks: \url{https://github.com/DIRECTLab/IsaacLab-HARL}} }

Adjust venue/year as appropriate once finalized.