Instructions for Porting a New Task
This guide provides step-by-step instructions for adding a new task to the MIP framework. Follow these steps to ensure consistency and completeness.
Overview
Porting a new task involves four main steps:
- Dataset Preparation: Download/prepare dataset and create dataset loaders
- Environment Setup: Create environment wrappers for evaluation
- Training Configuration: Create training scripts and configuration files
- Testing: Verify the implementation with test scripts
1. Dataset Preparation
1.1 Dataset Download and Upload (Optional)
We use HuggingFace to manage datasets. If you want to share your dataset:
- Upload your dataset to the HuggingFace dataset repo:
ChaoyiPan/mip-dataset - Refer to
examples/process_single_robomimic_dataset.pyfor examples - Consider creating a dataset uploader script in
examples/for convenience
Example: For PushT, you can download from lerobot/pusht and then upload, or use an existing zarr file locally.
1.2 Create Dataset Loader
Create a new dataset loader file: mip/datasets/<task_name>_dataset.py
Required components:
make_dataset(task_config, mode="train")function: Factory function that creates the appropriate dataset based on configurationpythondef make_dataset(task_config, mode="train"): """Create dataset based on configuration.""" if task_config.obs_type == "state": return StateDataset(...) elif task_config.obs_type == "image": return ImageDataset(...) else: raise ValueError(f"Invalid observation type: {task_config.obs_type}")Dataset classes: Inherit from
BaseDatasetand implement:__init__(): Initialize dataset, replay buffer, sampler, and normalizer__len__(): Return dataset length__getitem__(idx): Return a sample with structure:python{ "obs": {...}, # Observations (dict for image, tensor for state) "action": tensor # Actions }get_normalizer(): Return normalizers for observations and actions
Key considerations:
- Support different observation types (state, image, keypoint, etc.)
- Use
ReplayBufferfor data storage - Use
SequenceSamplerfor sequence sampling with padding - Use appropriate normalizers (
MinMaxNormalizer,ImageNormalizer) - Handle
horizon,pad_before,pad_aftercorrectly
Example structure:
class TaskStateDataset(BaseDataset):
def __init__(self, dataset_path, horizon=1, pad_before=0, pad_after=0):
super().__init__()
self.replay_buffer = ReplayBuffer.copy_from_path(dataset_path, keys=...)
self.sampler = SequenceSampler(...)
self.normalizer = self.get_normalizer()
def get_normalizer(self):
return {
"obs": {"state": MinMaxNormalizer(...)},
"action": MinMaxNormalizer(...)
}
def __len__(self):
return len(self.sampler)
def __getitem__(self, idx):
sample = self.sampler.sample_sequence(idx)
# Process and normalize sample
return torch_data1.3 Create Dataset Tests
Create test file: mip/datasets/<task_name>_dataset_test.py
Required tests:
test_initialization: Verify dataset can be initializedtest_len: Check__len__returns correct lengthtest_getitem: Verify__getitem__returns correct structure and shapestest_normalizer: Check normalizer is created correctly- Tests for each observation type (state, image, etc.)
Use pytest fixtures to create temporary test data (zarr or hdf5).
Reference files:
- Dataset:
mip/datasets/robomimic_dataset.py,mip/datasets/pusht_dataset.py - Tests:
mip/datasets/robomimic_dataset_test.py,mip/datasets/pusht_dataset_test.py
2. Environment Setup
2.1 Create Environment Wrapper
Create environment wrapper file: mip/envs/<task_name>/<task_name>_env_wrapper.py
Required function:
def make_vec_env(task_config: TaskConfig, seed=None):
"""Create vectorized environments.
Args:
task_config: Task configuration
seed: Random seed for environments
Returns:
Vectorized gym environment
"""
# Choose vectorization strategy
if task_config.num_envs == 1 or task_config.save_video:
vnc_env_class = gym.vector.SyncVectorEnv
else:
vnc_env_class = gym.vector.AsyncVectorEnv
# Create vectorized environments
envs = vnc_env_class([
make_task_env(task_config, idx, False, seed=seed)
for idx in range(task_config.num_envs)
])
return envsHelper function structure:
def make_task_env(task_config: TaskConfig, idx, render=False, seed=None):
"""Create a single environment instance."""
def thunk():
# 1. Create base environment (import locally to avoid dependency issues)
env = create_base_env(task_config)
# 2. Add observation wrappers if needed
if task_config.obs_type == "image":
env = ImageWrapper(env)
# 3. Add video recording wrapper
video_recoder = VideoRecorder.create_h264(...)
env = VideoRecordingWrapper(env, video_recoder, ...)
# 4. Add multi-step wrapper (important!)
env = MultiStepWrapper(
env,
n_obs_steps=task_config.obs_steps,
n_action_steps=task_config.act_steps,
max_episode_steps=task_config.max_episode_steps,
)
# 5. Set seed
if seed is not None:
env.seed(seed + idx)
return env
return thunkKey wrappers:
VideoRecordingWrapper: For video recordingMultiStepWrapper: Handles observation stacking and action repetition- Custom wrappers for observation preprocessing
2.2 Update init.py
Update mip/envs/<task_name>/__init__.py:
from mip.envs.<task_name>.<task_name>_env_wrapper import make_vec_env
__all__ = ["make_vec_env"]Reference files:
mip/envs/robomimic/robomimic_env.pymip/envs/pusht/pusht_env_wrapper.py
3. Training Configuration
3.1 Create Training Script
Create training script: examples/train_<task_name>.py
Required functions:
train(config, envs, dataset, agent, logger): Main training loop- Load data with DataLoader
- Setup LR scheduler and warmup scheduler
- Training loop with periodic logging, saving, and evaluation
- Track best and average metrics
eval(config, envs, dataset, agent, logger, num_steps=1): Evaluation function- Reset environments
- Collect episodes
- Normalize observations
- Sample actions from agent
- Unnormalize actions
- Execute actions and collect rewards
- Return metrics (mean_step, mean_reward, mean_success)
main(config): Main entry point- Setup logger, environment, dataset, agent
- Auto-configure network encoder based on observation type
- Call train or eval based on mode
Important details:
- Handle different observation types (state, image, keypoint)
- Properly normalize/unnormalize observations and actions
- Use appropriate action slicing:
act[:, start:end, :]wherestart = obs_steps - 1 - Support video recording during evaluation
3.2 Create Task Configurations
Create task config files: examples/configs/task/<task_name>_<obs_type>.yaml
Required fields:
# @package task
defaults:
- _self_
# Task identification
env_name: "task_name"
obs_type: "state" # or "image", "keypoint", etc.
abs_action: false # true for absolute actions, false for delta actions
# Dataset configuration
dataset_path: "path/to/dataset"
# OR for HuggingFace:
# dataset_repo: "ChaoyiPan/mip-dataset"
# dataset_filename: "task/variant/dataset.hdf5"
# Environment parameters
max_episode_steps: 400
# Task parameters
obs_steps: 2 # Number of observation steps to stack
act_steps: 8 # Number of action steps to execute
horizon: 16 # Prediction horizon
num_envs: 1 # Number of parallel environments
save_video: false # Whether to save videos
val_dataset_percentage: 0.0 # Validation split percentage
# Dimensions
act_dim: 10 # Action dimension
obs_dim: 20 # Observation dimensionFor image observations, add:
shape_meta:
action:
shape: [10]
obs:
image_key:
shape: [3, 96, 96]
type: rgb
lowdim_key:
shape: [10]
type: low_dim
# Image encoder settings
rgb_model: resnet18
resize_shape: null
crop_shape: [84, 84]
random_crop: true
use_group_norm: true
use_seq: trueCreate configs for each observation type:
<task_name>_state.yaml<task_name>_image.yaml<task_name>_keypoint.yaml(if applicable)
3.3 Create Test Script
Create test script: examples/test_<task_name>.py
Purpose: Verify all task configurations work correctly
Structure:
TASKS = {
"task_name": ["state", "image", "keypoint"],
}
TEST_STEPS = 10 # Quick test
BATCH_SIZE = 4
def test_task(task_name, obs_type):
"""Test a single configuration."""
cmd = [
"uv", "run", "python",
"examples/train_task.py",
"-cn", "exps/debug.yaml",
f"task={task_name}_{obs_type}",
f"optimization.gradient_steps={TEST_STEPS}",
# ... other overrides
]
# Run and verify
def main():
"""Test all configurations."""
# Run tests and report resultsReference files:
- Training:
examples/train_robomimic.py,examples/train_pusht.py - Config:
examples/configs/task/lift_ph_state.yaml,examples/configs/task/pusht_state.yaml - Test:
examples/test_robomimic.py,examples/test_pusht.py
4. Testing and Verification
4.1 Download and Process Dataset
First, download and process the dataset from the source:
# Download dataset (will download to data/<task_name>/)
uv run python examples/process_<task_name>_dataset.py --skip_upload
# Or upload to HuggingFace (requires authentication)
uv run python examples/process_<task_name>_dataset.py4.2 Quick Dataset Test
Verify the dataset loader works correctly:
# Quick test to verify dataset loading
uv run python -c "from mip.datasets.<task_name>_dataset import <Task>StateDataset; \
ds = <Task>StateDataset('data/<task_name>/<dataset_dir>', horizon=16, pad_before=1, pad_after=7); \
print(f'✅ Dataset loaded: {len(ds)} samples'); \
item = ds[0]; \
print(f'✅ State shape: {item[\"obs\"][\"state\"].shape}'); \
print(f'✅ Action shape: {item[\"action\"].shape}')"4.3 Run Dataset Tests
# Test dataset loader (may skip if HuggingFace download not set up)
uv run pytest mip/datasets/<task_name>_dataset_test.py -v
# Test specific observation type
uv run pytest mip/datasets/<task_name>_dataset_test.py::Test<Task>StateDataset -v4.4 Run Training Test
# Test with debug config using local dataset
uv run python examples/train_<task_name>.py -cn exps/debug.yaml task=<task_name>_state +task.dataset_path=data/<task_name>/<dataset_dir>
# Test with HuggingFace dataset (if uploaded)
uv run python examples/train_<task_name>.py -cn exps/debug.yaml task=<task_name>_state
# Run full test suite
uv run python examples/test_<task_name>.py4.5 Verification Checklist
- [ ] Dataset download script works
- [ ] Dataset loader works for all observation types
- [ ] Dataset tests pass (or skip gracefully)
- [ ] Environment can be created and reset
- [ ] Training loop runs without errors
- [ ] Evaluation works correctly
- [ ] Video recording works (if enabled)
- [ ] All configurations in test script pass
- [ ] Metrics are logged correctly
4.6 Common Testing Issues
Dataset not found:
- Make sure to run
process_<task_name>_dataset.pyfirst - Check the dataset path in the config matches the downloaded location
- Use
+task.dataset_path=<path>to override the config
Import errors:
- Ensure all dependencies are installed:
uv sync - For environments using legacy APIs, check compatibility wrappers
Gym vs Gymnasium:
- This repo uses
gymnasium(modern replacement forgym) - For environments using old
gymAPI (like adept_envs), create wrapper to convert:reset()→reset(seed, options)returns(obs, info)step()returns 5-tuple(obs, reward, terminated, truncated, info)render(mode)→render()returns array
5. Common Patterns and Tips
5.1 Observation Processing
State observations:
# In dataset
obs_dict = {"state": state_tensor}
# In eval
obs = dataset.normalizer["obs"]["state"].normalize(obs)
obs = torch.tensor(obs, device=device)Image observations:
# In dataset
obs_dict = {
"image": image_tensor, # (T, C, H, W)
"agent_pos": pos_tensor
}
# In eval
for k in obs_raw:
obs[k] = dataset.normalizer["obs"][k].normalize(obs[k])
obs[k] = torch.tensor(obs[k], device=device)5.2 Action Processing
# In eval loop
start = config.task.obs_steps - 1
end = start + config.task.act_steps
action = action_pred[:, start:end, :]
# For absolute actions with rotation
if config.task.abs_action:
action = dataset.undo_transform_action(action)5.3 Directory Structure
mip/
├── mip/
│ ├── datasets/
│ │ ├── task_dataset.py
│ │ └── task_dataset_test.py
│ └── envs/
│ └── task/
│ ├── __init__.py
│ ├── task_env.py
│ └── task_env_wrapper.py
└── examples/
├── train_task.py
├── test_task.py
└── configs/
└── task/
├── task_state.yaml
├── task_image.yaml
└── task_keypoint.yaml6. Reference Examples
Complete Examples
Robomimic: State and image observations with rotation transformations
- Dataset:
mip/datasets/robomimic_dataset.py - Environment:
mip/envs/robomimic/robomimic_env.py - Training:
examples/train_robomimic.py
- Dataset:
PushT: State, keypoint, and image observations
- Dataset:
mip/datasets/pusht_dataset.py - Environment:
mip/envs/pusht/pusht_env_wrapper.py - Training:
examples/train_pusht.py
- Dataset:
External References
- Adaptive Diffusion pusht_launcher: For comparison and insights
- CleanDiffuser pusht_dataset: For dataset format reference
7. Troubleshooting
Common Issues
- Import errors: Make sure to import heavy dependencies (like robomimic) inside functions to avoid unnecessary dependencies
- Shape mismatches: Verify observation and action dimensions match between dataset, environment, and config
- Normalization issues: Ensure consistent normalization between training and evaluation
- Seed issues: Set seeds for both environments and random number generators
- Video recording: Ensure VideoRecordingWrapper is added before MultiStepWrapper
Debug Tips
- Use
debug.yamlconfig for quick testing with small gradient_steps - Print shapes during development to verify dimensions
- Test each component independently (dataset, environment, training)
- Check normalizer statistics to ensure reasonable ranges