Instant Automatic Emptying of Panoramic Indoor Scenes

Giovanni Pintore; Marco Agus; Eva Almansa; Enrico Gobbetti

doi:10.1109/TVCG.2022.3202999

Instant Automatic Emptying of Panoramic Indoor Scenes

Giovanni Pintore, Marco Agus, Eva Almansa, and Enrico Gobbetti
November 2022

Abstract

Nowadays 360^CIRC cameras, capable to capture full environments in a single shot, are increasingly being used in a variety of Extended Reality (XR) applications that require specific Diminished Reality (DR) techniques to conceal selected classes of objects. In this work, we present a new data-driven approach that, from an input 360^CIRC image of a furnished indoor space automatically returns, with very low latency, an omnidirectional photorealistic view and architecturally plausible depth of the same scene emptied of all clutter. Contrary to recent data-driven inpainting methods that remove single user-defined objects based on their semantics, our approach is holistically applied to the entire scene, and is capable to separate the clutter from the architectural structure in a single step. By exploiting peculiar geometric features of the indoor environment, we shift the major computational load on the training phase and having an extremely lightweight network at prediction time. Our end-to-end approach starts by calculating an attention mask of the clutter in the image based on the geometric difference between full and empty scene. This mask is then propagated through gated convolutions that drive the generation of the output image and its depth. Returning the depth of the resulting structure allows us to exploit, during supervised training, geometric losses of different orders, including robust pixel-wise geometric losses and high-order 3D constraints typical of indoor structures. The experimental results demonstrate that our method provides interactive performance and outperforms current state-of-the-art solutions in prediction accuracy on available commonly used indoor panoramic benchmarks. In addition, our method presents consistent quality results even for scenes captured in the wild and for data for which there is no ground truth to support supervised training.

Reference and download information

Giovanni Pintore, Marco Agus, Eva Almansa, and Enrico Gobbetti. Instant Automatic Emptying of Panoramic Indoor Scenes. IEEE Transactions on Visualization and Computer Graphics, 28(11): 3629-3639, November 2022. DOI: 10.1109/TVCG.2022.3202999. Proc. ISMAR.

Related multimedia productions

Giovanni Pintore, Marco Agus, Eva Almansa, and Enrico Gobbetti
Instant Automatic Emptying of Panoramic Indoor Scenes
CRS4 Video n. 186 - Date: November, 2022
Presented at ISMAR 2022

Bibtex citation record

@Article{Pintore:2022:IAE, author = {Giovanni Pintore and Marco Agus and Eva Almansa and Enrico Gobbetti}, title = {Instant Automatic Emptying of Panoramic Indoor Scenes}, journal = {IEEE Transactions on Visualization and Computer Graphics}, volume = {28}, number = {11}, pages = {3629--3639}, month = {November}, year = {2022}, abstract = { Nowadays 360$^{\circ}$ cameras, capable to capture full environments in a single shot, are increasingly being used in a variety of Extended Reality (XR) applications that require specific Diminished Reality (DR) techniques to conceal selected classes of objects. In this work, we present a new data-driven approach that, from an input 360$^{\circ}$ image of a furnished indoor space automatically returns, with very low latency, an omnidirectional photorealistic view and architecturally plausible depth of the same scene emptied of all clutter. Contrary to recent data-driven inpainting methods that remove single user-defined objects based on their semantics, our approach is holistically applied to the entire scene, and is capable to separate the clutter from the architectural structure in a single step. By exploiting peculiar geometric features of the indoor environment, we shift the major computational load on the training phase and having an extremely lightweight network at prediction time. Our end-to-end approach starts by calculating an attention mask of the clutter in the image based on the geometric difference between full and empty scene. This mask is then propagated through gated convolutions that drive the generation of the output image and its depth. Returning the depth of the resulting structure allows us to exploit, during supervised training, geometric losses of different orders, including robust pixel-wise geometric losses and high-order 3D constraints typical of indoor structures. The experimental results demonstrate that our method provides interactive performance and outperforms current state-of-the-art solutions in prediction accuracy on available commonly used indoor panoramic benchmarks. In addition, our method presents consistent quality results even for scenes captured in the wild and for data for which there is no ground truth to support supervised training. }, doi = {10.1109/TVCG.2022.3202999}, note = {Proc. ISMAR}, url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintore:2022:IAE'}, }