Much Ado About Nosing: Dispelling the Myths of Generative Robotic Control

Chaoyi Pan^1,$, Giri Anantharaman¹, Nai-Chieh Huang¹, Claire Jin¹, Daniel Pfrommer², Chenyang Yuan³, Frank Permenter³, Guannan Qu^1,†, Nicholas Boffi^1,†, Guanya Shi^1,†, Max Simchowitz^1,†

¹Carnegie Mellon University, ²Massachusetts Institute of Technology, ³Toyota Research Institute

^$Project lead. ^†Equal advising.

arXiv (coming soon) Code Documentation Data Checkpoints

Abstract

Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimal iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance.

Finding 1: Neither multi-modality nor policy expressivity account for GCPs' success

Through careful benchmarking over 27 tasks with 3 different input modalities (state, image, point cloud), we found:

With proper architecture, regression ≈ flow in most tasks
Flow mainly wins in high precision tasks
Neither multi-modality nor policy expressivity account for GCPs' success

Note: Carefully aligning the architecture and training procedure between RCP and GCPs is important.

Finding 2: Noise injection and supervised iterative compute drives the success

Given a common GCP architecture, we first expose the key ingredient of GCPs:

We then systematically identify the critical components:

After benchmarking on 7 most challenging tasks, we found: supervised iterative compute + stochasticity injection is the key.

Note: For control problems, distribution fitting is less important for final performance. Instead of focusing on action generation itself, it is more important to explore the design space of the mapping from observation to action.

Finding 3: Manifold adherence given out-of-distribution observations is the key

What benefit do stochasticity injection and supervised iterative compute bring?

We found that it mainly helps the policy to adhere to the manifold of the expert data given out-of-distribution observations.