thumbnail

PanoFloor: reconstruction and immersive exploration of large multi-room scenes from a minimal set of registered panoramic images using denoised density maps

Giovanni Pintore, Sara Jashari, Marco Agus, and Enrico Gobbetti

October 2025

Abstract

We introduce a deep learning approach to automatically generate 3D floor plans and immersive multi-room virtual visit experiences from a small set of co-registered 360-degree panoramas - down to just one per room. We integrate novel neural networks that leverage panoramic image broad context and large annotated room datasets to build a geometric and visual graph. Nodes represent stereo-viewable multiple-center-of-projection (MCOP) 360-degree images at the capture locations, while arcs connect them with paths through doors, avoiding clutter and minimizing disocclusions to maximize visual quality. The process starts with depth prediction and floor-plan projection to create a comprehensive but noisy global density map, which is refined via a latent diffusion model. A segmentation network then extracts room layouts, openings, and clutter. This structured representation is lifted to a visual one by creating a 360-degree stereo-explorable MCOP representation at each node, produced using a view-synthesis network from the original image and its predicted depth map. Arc paths are then computed using an optimization process that considers structural constraints, including openings and obstacles, while minimizing visual discontinuities, occlusions, and disocclusions. Finally, 360-degree video transitions are synthesized using a specialized view-synthesis network to obtain a fully precomputed WebXR-ready explorable representation that can be efficiently experienced on Head-Mounted-Displays with limited graphics capabilities. The extracted floor plan not only aids in documenting the captured building but can also enhance immersive experiences by serving as a live map of the building. Our experiments show that the method achieves state-of-the-art reconstruction from sparse inputs and supports compelling immersive visits.

Reference and download information

Giovanni Pintore, Sara Jashari, Marco Agus, and Enrico Gobbetti. PanoFloor: reconstruction and immersive exploration of large multi-room scenes from a minimal set of registered panoramic images using denoised density maps. In Proc. IEEE ISMAR, October 2025. To appear.

Related multimedia productions

Bibtex citation record

@inproceedings{Pintore:2025:PRI,
    author = {Giovanni Pintore and Sara Jashari and Marco Agus and Enrico Gobbetti},
    title = {{PanoFloor}: reconstruction and immersive exploration of large multi-room scenes from a minimal set of registered panoramic images using denoised density maps},
    booktitle = {Proc. IEEE ISMAR},
    month = {October},
    year = {2025},
    abstract = { We introduce a deep learning approach to automatically generate 3D floor plans and immersive multi-room virtual visit experiences from a small set of co-registered 360-degree panoramas -- down to just one per room. We integrate novel neural networks that leverage panoramic image broad context and large annotated room datasets to build a geometric and visual graph. Nodes represent stereo-viewable multiple-center-of-projection (MCOP) 360-degree images at the capture locations, while arcs connect them with paths through doors, avoiding clutter and minimizing disocclusions to maximize visual quality. The process starts with depth prediction and floor-plan projection to create a comprehensive but noisy global density map, which is refined via a latent diffusion model. A segmentation network then extracts room layouts, openings, and clutter. This structured representation is lifted to a visual one by creating a 360-degree stereo-explorable MCOP representation at each node, produced using a view-synthesis network from the original image and its predicted depth map. Arc paths are then computed using an optimization process that considers structural constraints, including openings and obstacles, while minimizing visual discontinuities, occlusions, and disocclusions. Finally, 360-degree video transitions are synthesized using a specialized view-synthesis network to obtain a fully precomputed WebXR-ready explorable representation that can be efficiently experienced on Head-Mounted-Displays with limited graphics capabilities. The extracted floor plan not only aids in documenting the captured building but can also enhance immersive experiences by serving as a live map of the building. Our experiments show that the method achieves state-of-the-art reconstruction from sparse inputs and supports compelling immersive visits. },
    note = {To appear},
    url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintore:2025:PRI'},
}