Architecture & Design
This page provides an in-depth look at this repository's architecture and design principles.
Overview
This repository is built on the stochastic interpolant framework, a unified mathematical formulation that encompasses flow matching, consistency models, and various distillation methods.
Stochastic Interpolant
The stochastic interpolant framework provides a unified view of generative modeling:
a_t = α(t) * a_0 + β(t) * a_1Where:
a_tinterpolates between action at timet=0and action at timet=1α(t),β(t)are interpolation schedules (linear or trigonometric)- The goal is to learn a velocity field
v(a, t)that transforms action at timet=0to action at timet=1v(a, t) = d/dt a(t)
Algorithms Overview
Flow Matching (flow)
https://arxiv.org/abs/2210.02747
Standard flow matching loss.
Train:
b_θ ≈ argmin_θ 𝔼[‖b_t(I_t | o) - İ_t‖²], where t ~ Unif([0,1]), z ~ N(0, I)Inference:
d/dt a_t = b_t(a_t | o) with initial condition a_0 = zRegression (regression)
Direct regression loss, learns conditional action mean.
Train:
π_θ ≈ argmin_θ 𝔼[‖π_θ(o, I_0, t=0) - a‖²]Inference:
â ← π_θ(o, z, t=0)Two Step Denoising (tsd)
Two step denoising, a simplified version of flow matching.
Train:
π_θ ≈ argmin_θ 𝔼[‖π_θ(o, I_0, t=0) - I_fix‖² + ‖π_θ(o, I_fix, t_fix) - a‖²]Inference:
â_0 ← π_θ(o, z, 0)
â ← π_θ(o, t_fix * â_0 + (1 - t_fix) * z, t_fix)Minimum Iterative Policy (mip)
Minimum Iterative Policy, optimized for two-step sampling by removing redundant stochasticity in input.
Train:
π_mip^θ ≈ argmin_θ 𝔼[‖π_θ(o, I_0 = 0, t=0) - a‖² + ‖π_θ(o, I_t_fix, t_fix) - a‖²]Inference:
â_mip^0 ← π_mip^θ(o, 0, t=0)
â_mip ← π_mip^θ(o, t_fix * â_mip^0, t_fix)Consistency Trajectory Model (ctm)
https://arxiv.org/abs/2310.02279
Consistency Trajectory Model, progressively distills multi-step flow into flow map/shortcut models.
Train:
Φ_{s,t}(I_s) ≈ Φ_{s+dt, t}(stopgrad(Φ_{s, s+dt}(I_s)))Inference:
a = Φ_{0,1}(z), where z ~ N(0, I)Progressive Self-Distillation (psd)
https://arxiv.org/pdf/2505.18825
Progressive Self-Distillation, a self-distillation framework which trains a flow map.
Train:
L_SD = L_b + L_D
L_b = ‖b_t(I_t) - İ_t‖²
L_D = ‖Φ_{s,t}(I_s) - Φ_{u,t}(Φ_{s,u}(I_s))‖²Inference:
â = Φ_{0,1}(z), where z ~ N(0, I)Lagrangian Self-Distillation (lsd)
https://arxiv.org/abs/2505.18825
Lagrangian Self-Distillation, a self-distillation framework which trains a flow map.
Train:
L_SD = L_b + L_D
L_b = ‖b_t(I_t) - İ_t‖²
L_D = ‖∂_t Φ_{s,t}(I_s) - b_t(Φ_{s,t}(I_s))‖²Inference:
â = Φ_{0,1}(z), where z ~ N(0, I)Euler Self-Distillation (esd)
https://arxiv.org/abs/2505.18825
Euler Self-Distillation, a self-distillation framework which trains a flow map.
Train:
L_SD = L_b + L_D
L_b = ‖b_t(I_t) - İ_t‖²
L_D = ‖∂_s Φ_{s,t}(I_s) + ∇ Φ_{s,t}(I_s) · b_s(I_s)‖²Inference:
â = Φ_{0,1}(z), where z ~ N(0, I)Mean Flow (mf)
https://arxiv.org/abs/2505.13447
Mean Flow, a self-distillation framework which trains a flow map.
Define:
Φ_{s,t}(I_s) = I_s + (t - s) * v̄_{s,t}(I_s)Train:
L_SD = L_b + L_D
L_b = ‖b_t(I_t) - İ_t‖²
L_D = ‖∂_s Φ_{s,t}(I_s) + stopgrad(∇ Φ_{s,t}(I_s) · İ_s)‖²Inference:
â = Φ_{0,1}(z), where z ~ N(0, I)