📖 Abstract

Humans exhibit remarkable motor agility, enabling a wide range of dynamic skills such as running and jumping, which highlights the great potential of humanoid robots for athletic locomotion. Among athletic sports, long rope skipping requires two rope turners to cooperatively swing the rope while adapting to a player under different jumping rhythms, making it a meaningful yet challenging task for humanoid robots. Although existing methods for humanoid sports have achieved success in single-agent and interaction-free settings, such as running, dancing, and parkour, task scenarios that require precise coordination among multiple participants remain largely unexplored. To this end, we propose Marope, a multi-agent reinforcement learning (MARL) framework for cooperative long rope skipping with multiple humanoid robots. Specifically, Marope adopts a hierarchical reinforcement learning framework for policy training. At the lower level, it learns decentralized rope manipulation policies through MARL, while at the upper level, a centralized scheduling policy is trained to coordinate the execution of the lower-level policies. To improve generalization across different player behavioral styles, Marope further incorporates diverse jumping policies into cooperative game training. We evaluate our approach on Unitree G1 humanoid robots in both simulation and real-world settings. Experimental results demonstrate that Marope outperforms various baselines, achieving more efficient and stable rope manipulation as well as more robust and adaptable cooperation with varied players.

⚙️ Method

Overview of Marope
Overview of Marope. (a) For long rope skipping task, Marope builds a pipeline for learning long rope skipping skills on multiple humanoid robots (b) A hierarchical coordination framework is used for efficient coordination with player under specific jump rhythm. (c) The low-level decentralized rope manipulation policy is trained via MARL. (d) Through an IPM-based diversity intrinsic objective, diverse player behaviors are discovered to improve generality of high-level scheduling policy.

📊 Experiments

Simulation Results of Rope Manipulation
Category Metrics Ours Single Agent Open Loop w/o Segment Sampling
Rope Manipulation Erot (↓) 1.719 ± 0.332 2.733 ± 0.539 6.754 ± 2.018 2.223 ± 0.484
Ewid (↓) 0.063 ± 0.011 0.136 ± 0.030 0.284 ± 0.135 0.087 ± 0.038
Coordinated Movement Elin (↓) 0.103 ± 0.011 0.222 ± 0.052 0.199 ± 0.049 0.121 ± 0.024
Eang (↓) 0.109 ± 0.041 0.114 ± 0.039 0.112 ± 0.040 0.110 ± 0.041
Control Stability Act. Rate (↓) 1.349 ± 0.130 1.526 ± 0.207 0.945 ± 0.097 1.300 ± 0.142
Feet Slip. (↓) 0.040 ± 0.003 0.122 ± 0.010 0.082 ± 0.012 0.053 ± 0.004
Simulation Results of Player Coordination
Metric Ours w/o Scheduling w/o Player Diversity
Overlap Ratio (↓) 0.090 ± 0.049 0.392 ± 0.140 0.120 ± 0.094
Phase Tracking Error (↓) 0.397 ± 0.115 1.092 ± 0.273 0.414 ± 0.155
Jumper Tracking Error (↓) 0.116 ± 0.022 0.182 ± 0.360 0.172 ± 0.021
Complete Rate (↑) 0.787 ± 0.204 0.358 ± 0.207 0.751 ± 0.216

🎬 Demos

Simulation: Humanoids - Humanoid
Simulation: Humanoids - Human
Real-world: Dynamic Partner Following
Real-world: Humanoids-Human

📌 Citation

@inproceedings{anonymous2026marope,
  title     = {Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning},
  author    = {Anonymous Authors},
  booktitle = {Under Review},
  year      = {2026}
}