Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation
ICRA · 2026
Direct metric depth estimation from a monocular RGB image is error-prone in out of distribution scenarios (e.g. dusty environment). Can we leverage an IMU to increase the accuracy of metric depth estimation at inference time?
This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor. We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.
Figures
Acknowledgments
The authors would like to thank Jonathan Lee for valuable discussions and insights. This material is based upon work supported in part by the Army Research Laboratory and the Army Research Office under contract/grant number W911NF-25-2-0153.
BibTeX
@inproceedings{zero-shot-depth-rescaling-2026,
title={Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation},
author={Steven Yang, Xiaoyu Tian, Kshitij Goel, and Wennie Tabib},
booktitle={IEEE International Conference on Robotics and Automation (ICRA), 2026},
year={2026}
}