Deep Panoramic Depth Prediction and Completion for Indoor Scenes
Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, and Enrico Gobbetti
February 2024
Abstract
We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.
Reference and download information
Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, and Enrico Gobbetti. Deep Panoramic Depth Prediction and Completion for Indoor Scenes. Computational Visual Media, February 2024. DOI: 10.1007/s41095-023-0358-0.
Related multimedia productions
Bibtex citation record
@Article{Pintore:2024:DPD, author = {Giovanni Pintore and Eva Almansa and Armando Sanchez and Giorgio Vassena and Enrico Gobbetti}, title = {Deep Panoramic Depth Prediction and Completion for Indoor Scenes}, journal = {Computational Visual Media}, month = {February}, year = {2024}, abstract = { We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning. }, doi = {10.1007/s41095-023-0358-0}, url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintore:2024:DPD'}, }
The publications listed here are included as a means to ensure timely
dissemination of scholarly and technical work on a non-commercial basis.
Copyright and all rights therein are maintained by the authors or by
other copyright holders, notwithstanding that they have offered their works
here electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the
explicit permission of the copyright holder.
Please contact the authors if you are willing to republish this work in
a book, journal, on the Web or elsewhere. Thank you in advance.
All references in the main publication page are linked to a descriptive page
providing relevant bibliographic data and, possibly, a link to
the related document. Please refer to our main
publication repository page for a
page with direct links to documents.