Much Ado About Nosing: Dispelling the Myths of Generative Robotic Control

1Carnegie Mellon University, 2Massachusetts Institute of Technology, 3Toyota Research Institute
$Project lead. Equal advising.
Teaser image

Abstract

Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimal iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance.

Finding 1: Neither multi-modality nor policy expressivity account for GCPs' success

Through careful benchmarking over 27 tasks with 3 different input modalities (state, image, point cloud), we found:

  1. With proper architecture, regression ≈ flow in most tasks
  2. Flow mainly wins in high precision tasks
  3. Neither multi-modality nor policy expressivity account for GCPs' success
Performance comparison across tasks

Note: Carefully aligning the architecture and training procedure between RCP and GCPs is important.

Finding 2: Noise injection and supervised iterative compute drives the success

Given a common GCP architecture, we first expose the key ingredient of GCPs:

GCP Architecture

We then systematically identify the critical components:

Key ingredients of GCPs

After benchmarking on 7 most challenging tasks, we found: supervised iterative compute + stochasticity injection is the key.

Results showing key findings

Note: For control problems, distribution fitting is less important for final performance. Instead of focusing on action generation itself, it is more important to explore the design space of the mapping from observation to action.

Finding 3: Manifold adherence given out-of-distribution observations is the key

What benefit do stochasticity injection and supervised iterative compute bring?

We found that it mainly helps the policy to adhere to the manifold of the expert data given out-of-distribution observations.

Manifold adherence visualization