Hi ABC team,
First of all, thank you for open-sourcing the dataset and training pipeline.
I have a question about camera calibration in the public release.
From the released conversion and training code, it looks like training uses:
states_actions.bin
combined_camera-images-rgb.mp4
episode_metadata.json (task name, camera list, timing info, etc.)
I could not find per-timestamp camera extrinsics being exported or loaded during training.
Could you clarify whether this is intentional for the current release?
Specifically:
Are per-timestamp camera extrinsics available anywhere in the raw data (for example in MCAP topics/metadata) but just not included in the converted training format?
If not currently available, are there plans to release them in a future update?
If the camera rig is assumed static, is there a recommended canonical extrinsic for each camera that users should use for geometry-aware methods?
This would be very helpful for users working on multi-view geometric reasoning, 3D reconstruction, or camera-aware policy learning.
Thanks again for the great release.
Hi ABC team,
First of all, thank you for open-sourcing the dataset and training pipeline.
I have a question about camera calibration in the public release.
From the released conversion and training code, it looks like training uses:
states_actions.bin
combined_camera-images-rgb.mp4
episode_metadata.json (task name, camera list, timing info, etc.)
I could not find per-timestamp camera extrinsics being exported or loaded during training.
Could you clarify whether this is intentional for the current release?
Specifically:
Are per-timestamp camera extrinsics available anywhere in the raw data (for example in MCAP topics/metadata) but just not included in the converted training format?
If not currently available, are there plans to release them in a future update?
If the camera rig is assumed static, is there a recommended canonical extrinsic for each camera that users should use for geometry-aware methods?
This would be very helpful for users working on multi-view geometric reasoning, 3D reconstruction, or camera-aware policy learning.
Thanks again for the great release.