thumbnail

Deep Panoramic Depth Prediction and Completion for Indoor Scenes

Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, and Enrico Gobbetti

February 2024

Abstract

We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.

Reference and download information

Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, and Enrico Gobbetti. Deep Panoramic Depth Prediction and Completion for Indoor Scenes. Computational Visual Media, February 2024. DOI: 10.1007/s41095-023-0358-0.

Related multimedia productions

Bibtex citation record

@Article{Pintore:2024:DPD,
    author = {Giovanni Pintore and Eva Almansa and Armando Sanchez and Giorgio Vassena and Enrico Gobbetti},
    title = {Deep Panoramic Depth Prediction and Completion for Indoor Scenes},
    journal = {Computational Visual Media},
    month = {February},
    year = {2024},
    abstract = { We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning. },
    doi = {10.1007/s41095-023-0358-0},
    url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintore:2024:DPD'},
}