Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

📖 Abstract

Humans exhibit remarkable motor agility, enabling a wide range of dynamic skills such as running and jumping, which highlights the great potential of humanoid robots for athletic locomotion. Among athletic sports, long rope skipping requires two rope turners to cooperatively swing the rope while adapting to a player under different jumping rhythms, making it a meaningful yet challenging task for humanoid robots. Although existing methods for humanoid sports have achieved success in single-agent and interaction-free settings, such as running, dancing, and parkour, task scenarios that require precise coordination among multiple participants remain largely unexplored. To this end, we propose Marope, a multi-agent reinforcement learning (MARL) framework for cooperative long rope skipping with multiple humanoid robots. Specifically, Marope adopts a hierarchical reinforcement learning framework for policy training. At the lower level, it learns decentralized rope manipulation policies through MARL, while at the upper level, a centralized scheduling policy is trained to coordinate the execution of the lower-level policies. To improve generalization across different player behavioral styles, Marope further incorporates diverse jumping policies into cooperative game training. We evaluate our approach on Unitree G1 humanoid robots in both simulation and real-world settings. Experimental results demonstrate that Marope outperforms various baselines, achieving more efficient and stable rope manipulation as well as more robust and adaptable cooperation with varied players.

⚙️ Method

Overview of Marope. (a) For long rope skipping task, Marope builds a pipeline for learning long rope skipping skills on multiple humanoid robots (b) A hierarchical coordination framework is used for efficient coordination with player under specific jump rhythm. (c) The low-level decentralized rope manipulation policy is trained via MARL. (d) Through an IPM-based diversity intrinsic objective, diverse player behaviors are discovered to improve generality of high-level scheduling policy.

📊 Experiments

Simulation Results of Rope Manipulation
Category	Metrics	Ours	Single Agent	Open Loop	w/o Segment Sampling
Rope Manipulation	E_rot (↓)	1.719 ± 0.332	2.733 ± 0.539	6.754 ± 2.018	2.223 ± 0.484
Rope Manipulation	E_wid (↓)	0.063 ± 0.011	0.136 ± 0.030	0.284 ± 0.135	0.087 ± 0.038
Coordinated Movement	E_lin (↓)	0.103 ± 0.011	0.222 ± 0.052	0.199 ± 0.049	0.121 ± 0.024
Coordinated Movement	E_ang (↓)	0.109 ± 0.041	0.114 ± 0.039	0.112 ± 0.040	0.110 ± 0.041
Control Stability	Act. Rate (↓)	1.349 ± 0.130	1.526 ± 0.207	0.945 ± 0.097	1.300 ± 0.142
Control Stability	Feet Slip. (↓)	0.040 ± 0.003	0.122 ± 0.010	0.082 ± 0.012	0.053 ± 0.004

Simulation Results of Player Coordination
Metric	Ours	w/o Scheduling	w/o Player Diversity
Overlap Ratio (↓)	0.090 ± 0.049	0.392 ± 0.140	0.120 ± 0.094
Phase Tracking Error (↓)	0.397 ± 0.115	1.092 ± 0.273	0.414 ± 0.155
Jumper Tracking Error (↓)	0.116 ± 0.022	0.182 ± 0.360	0.172 ± 0.021
Complete Rate (↑)	0.787 ± 0.204	0.358 ± 0.207	0.751 ± 0.216

🎬 Demos

Simulation: Humanoids - Humanoid

Simulation: Humanoids - Human

Real-world: Dynamic Partner Following

Real-world: Humanoids-Human

📌 Citation

@inproceedings{anonymous2026marope,
  title     = {Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning},
  author    = {Anonymous Authors},
  booktitle = {Under Review},
  year      = {2026}
}