Incremental Multimodal Surface Mapping via Self-Organizing Gaussian Mixture Models

RAL · 2023

As multimodal point clouds are accrued from dense 3D sensors, how do we incrementally compress these point clouds into a Gaussian Mixture Model (GMM)?

Kshitij Goel Wennie Tabib

This letter describes an incremental multimodal surface mapping methodology, which represents the environment as a continuous probabilistic model. This model enables high-resolution reconstruction while simultaneously compressing spatial and intensity point cloud data. The strategy employed in this work utilizes Gaussian mixture models (GMMs) to represent the environment. While prior GMM-based mapping works have developed methodologies to determine the number of mixture components using information-theoretic techniques, these approaches either operate on individual sensor observations, making them unsuitable for incremental mapping, or are not real-time viable, especially for applications where high-fidelity modeling is required. To bridge this gap, this letter introduces a spatial hash map for rapid GMM submap extraction combined with an approach to determine relevant and redundant data in a point cloud. These contributions increase computational speed by an order of magnitude compared to state-of-the-art incremental GMM-based mapping. In addition, the proposed approach yields a superior tradeoff in map accuracy and size when compared to state-of-the-art mapping methodologies (both GMM- and not GMM-based). Evaluations are conducted using both simulated and real-world data. The software is released open-source to benefit the robotics community.

Figures

**Method Teaser** (Left) Reconstruction obtained on a synthetic dataset. (Center) Precision, recall, and reconstruction error tradeoff with map size on disk for Octomap, Nvblox, fixed component GMMs, and the proposed approach. The total time taken for data association is also shown to be lower than a prior GMM-based approach. (Right) Reconstruction obtained on a real-world dataset. The proposed approach yields a map that requires less disk space than the competing methods while demonstrating at par or better reconstruction accuracy (i.e., low reconstruction error and high precision).

**System Details** The incoming point cloud is first segmented into relevant and redundant data using the log-likelihood scores overt the existing model. The part of model that is updated is determined by the spatial hash map. After the model is updated, the spatial hash is updated using the sigma points of each mixture component. The use of spatial hash reduces the complexity of incremental mapping in the absence of ray casting.

**Relevant Point Cloud Calculation** Illustration of the relevant point cloud calculation using two multimodal point clouds, Z1 and Z2 (Section III-A.1). The objective is to find the relevant point cloud, Zr

**Calculation Time Comparison** Comparison of the relevant subset Zr calculation time between the prior work on multimodal GMM mapping [12] and the proposed approach. The per-frame calculation time in seconds is plotted for (a) different values of fixed numbers of components |J | and (b) different values of the bandwidth parameter σ for the proposed method. (c) Notice that the spatial hash (Section IIIA.3) enables an order of magnitude improvement and that the performance gains increase monotonically with model size. (d) shows an ablation of calculation times for different values of the spatial hash resolution parameter α. depth and color data by default. It is modified to use depth and grayscale images for the comparison presented in this section. Since the software for prior GMM map works [12, 14, 31] is not openly available, the codebase for the proposed approach is modified to use a fixed number of components for the FCGMM comparison. The FCGMM approach uses the GPU for EM execution but CPU for the Zr calculation because it requires access to more RAM than is available to the GPU.

**Quantitative Comparison** Quantitative comparison of (a) reconstruction error, (b) precision, (c) recall, and (d) PSNR as a function of the map size in megabytes (MB) for each approach. The dataset under consideration is the synthetic D1 dataset shown in Fig. 6a. Note that the proposed approach yields a map that requires less disk space than the competing methods while demonstrating at par or better reconstruction accuracy (i.e., low reconstruction error and high precision).

**Qualitative Comparison** Qualitative comparison of the reconstructions obtained by baseline methods and the proposed approach at similar values of map size for (a) D1 and (f) D2 datasets. The highest achievable resolution used during execution and resulting map sizes are reported in the sub-captions. (b) and (g) visualize the lowest level of the Octomap octree. Incorrect intensity values are visible due to the color averaging within the octree. (c) and (h) illustrate the mesh extracted from the stored TSDF for Nvblox. Aliasing is visible in the meshes due to large voxel sizes required for a lower memory footprint. (d), (i) FCGMM and (e), (j) the proposed method enable qualitatively similar high-resolution dense reconstructions; however, the FCGMM output requires a much longer time to process incremental observations (see Fig. 4). A video of the proposed approach reconstructing the D1 dataset is available at https://youtu.be/VgPEEcbUAnY.

**Real-World Dataset Results** Quantitative comparison of Octomap, Nvblox, FCGMM, and the proposed approach using the real-world datasets with noisy RGB-D data. The best and worst values in each column are colored green and red respectively. The FCGMM method results in a larger map size compared to the proposed approach and is orders of magnitude slower in execution time (Fig. 4). These results highlight that the proposed approach balances the accuracy and map size better than the state-of-the-art approaches.

BibTeX

@article{goel2023incremental,
	title={Incremental multimodal surface mapping via self-organizing gaussian mixture models},
	author={Goel, Kshitij and Tabib, Wennie},
	journal={IEEE Robotics and Automation Letters},
	volume={8},
	number={12},
	pages={8358--8365},
	year={2023},
	publisher={IEEE}
}