thumbnail

A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts

Ruggero Pintus, Ying Yang, Enrico Gobbetti, and Holly Rushmeier

October 2014

Abstract

Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books.

Reference and download information

Ruggero Pintus, Ying Yang, Enrico Gobbetti, and Holly Rushmeier. A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts. In The 12th Eurographics Workshop on Graphics and Cultural Heritage. Pages 35-44, October 2014. DOI: 10.2312/gch.20141302.

Related multimedia productions

Bibtex citation record

@InProceedings{Pintus:2014:ATA,
    author = {Ruggero Pintus and Ying Yang and Enrico Gobbetti and Holly Rushmeier},
    title = {{A TaLISMAN}: Automatic Text and LIne Segmentation of historical MANuscripts},
    booktitle = {The 12th Eurographics Workshop on Graphics and Cultural Heritage},
    pages = {35--44},
    month = {October},
    year = {2014},
    abstract = { Historical and artistic handwritten books are valuable cultural heritage (CH) items, as they provide information about tangible and intangible cultural aspects from the past. Massive digitization projects have made these kind of data available to a world-wide population, and pose real challenges for automatic processing. In this scenario, document layout analysis plays a significant role, being a fundamental step of any document image understanding system. In this paper, we present a completely automatic algorithm to perform a robust text segmentation of old handwritten manuscripts on a per-book basis, and we show how to exploit this outcome to find two layout elements, i.e., text blocks and text lines. Our proposed technique have been evaluated on a large and heterogeneous corpus content, and our experimental results demonstrate that this approach is efficient and reliable, even when applied to very noisy and damaged books. },
    doi = {10.2312/gch.20141302},
    url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintus:2014:ATA'},
}