-SLAM: Monocular Real-Time Dense Mapping with Hybrid Implicit Fields

IEEE Robotics and Automation Letters 2024

1University of Stuttgart   2Huawei Munich Research Center   3Huawei 2012 Laboratories   4Technical University of Munich  

HI-SLAM takes an RGB image stream running parallel pose tracking and dense mapping while performing on-the-fly map updates when loop closures are detected.

Abstract

We present HI-SLAM, a neural field-based realtime monocular mapping framework, for accurate and dense Simultaneous Localization and Mapping (SLAM). Recent neural mapping frameworks show promising results, but rely on RGBD or pose inputs, or cannot run in real-time. To address these limitations, our approach integrates dense-SLAM with neural implicit fields. Specifically, our dense SLAM approach runs parallel tracking and global optimization, while a neural fieldbased map is constructed incrementally based on the latest SLAM estimates. For the efficient construction of neural fields, we employ multi-resolution grid encoding and signed distance function (SDF) representation. This allows us to keep the map always up-to-date and adapt instantly to global updates via loop closing. For global consistency, we propose an efficient Sim(3)-based pose graph bundle adjustment (PGBA) approach to run online loop closing and mitigate the pose and scale drift. To enhance depth accuracy further, we incorporate learned monocular depth priors. We propose a novel joint depth and scale adjustment (JDSA) module to solve the scale ambiguity inherent in depth priors. Extensive evaluations across synthetic and realworld datasets validate that our approach outperforms existing methods in accuracy and map completeness while preserving real-time performance.


Method



Given an RGB image stream, HI-SLAM runs parallel tracking and mapping. On tracking part, two processes, namely frontend and backend, are spawn for local and global consistent tracking respectively. Our SLAM frontend further leverages a pre-trained CV model to predict monocular geometric priors. The keyframe data, including estimated poses, depths, and monocular normal priors, are shared between processes. On the mapping side, the neural map is constructed incrementally based on the latest estimates from the shared buffer in an online manner.


BibTeX

@article{zhang2024hi,
  title={HI-SLAM: Monocular Real-Time Dense Mapping With Hybrid Implicit Fields}, 
  author={Zhang, Wei and Sun, Tiecheng and Wang, Sen and Cheng, Qing and Haala, Norbert},
  journal={IEEE Robotics and Automation Letters},
  year={2024},    
  publisher={IEEE},
  doi={10.1109/LRA.2023.3347131}
}