MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic image

Uzair Shah; Muhammad Tukur; Mahmood Alzubaidi; Giovanni Pintore; Enrico Gobbetti; Mowafa Househ; Jens Schneider; Marco Agus

doi:0.1109/CVPRW63382.2024.00138

MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic image

Uzair Shah, Muhammad Tukur, Mahmood Alzubaidi, Giovanni Pintore, Enrico Gobbetti, Mowafa Househ, Jens Schneider, and Marco Agus
2024

Abstract

We present a novel holistic deep-learning approach for multi-task learning from a single indoor panoramic image. Our framework, named MultiPanoWise, extends vision transformers to jointly infer multiple pixel-wise signals, such as depth, normals, and semantic segmentation, as well as signals from intrinsic decomposition, such as reflectance and shading. Our solution leverages a specific architecture combining a transformer-based encoder-decoder with multiple heads, by introducing, in particular, a novel context adjustment approach, to enforce knowledge distillation between the various signals. Moreover, at training time we introduce a hybrid loss scalarization method based on an augmented Chebychev/hypervolume scheme. We demonstrate the capabilities of the proposed architecture on public-domain synthetic and real-world datasets. We showcase performance improvements with respect to the most recent methods specifically designed for single tasks, like, for example, individual depth estimation or semantic segmentation. To the best of our knowledge, this is the first architecture able to achieve state-of-the-art performance on the joint extraction of heterogeneous signals from single indoor omnidirectional images

Reference and download information

Uzair Shah, Muhammad Tukur, Mahmood Alzubaidi, Giovanni Pintore, Enrico Gobbetti, Mowafa Househ, Jens Schneider, and Marco Agus. MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic image. In Proc. OmniCV - IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Pages 1311-1321, 2024. DOI: 0.1109/CVPRW63382.2024.00138.

Related multimedia productions

Bibtex citation record

@inproceedings{Shah:2024:PSG, author = {Uzair Shah and Muhammad Tukur and Mahmood Alzubaidi and Giovanni Pintore and Enrico Gobbetti and Mowafa Househ and Jens Schneider and Marco Agus}, title = {{MultiPanoWise}: holistic deep architecture for multi-task dense prediction from a single panoramic image}, booktitle = {Proc. OmniCV - IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, pages = {1311--1321}, year = {2024}, abstract = { We present a novel holistic deep-learning approach for multi-task learning from a single indoor panoramic image. Our framework, named MultiPanoWise, extends vision transformers to jointly infer multiple pixel-wise signals, such as depth, normals, and semantic segmentation, as well as signals from intrinsic decomposition, such as reflectance and shading. Our solution leverages a specific architecture combining a transformer-based encoder-decoder with multiple heads, by introducing, in particular, a novel context adjustment approach, to enforce knowledge distillation between the various signals. Moreover, at training time we introduce a hybrid loss scalarization method based on an augmented Chebychev/hypervolume scheme. We demonstrate the capabilities of the proposed architecture on public-domain synthetic and real-world datasets. We showcase performance improvements with respect to the most recent methods specifically designed for single tasks, like, for example, individual depth estimation or semantic segmentation. To the best of our knowledge, this is the first architecture able to achieve state-of-the-art performance on the joint extraction of heterogeneous signals from single indoor omnidirectional images }, doi = {0.1109/CVPRW63382.2024.00138}, url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Shah:2024:PSG'}, }