thumbnail

Automatic Algorithms for Medieval Manuscript Analysis

Ruggero Pintus, Ying Yang, Holly Rushmeier, and Enrico Gobbetti

June 2017

Abstract

Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field, but requires automatic tools for analyzing and indexing these datasets. We will describe a set of completely automatic solutions to estimate per-page text leading, to extract text lines, blocks and other layout elements, and to perform query-by-example word-spotting on medieval manuscripts. Those techniques have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles, languages, image resolutions, amount of illumination and ornamentation, and levels of conservation, with various problematic issues such as holes, spots, ink bleed-through, ornamentation, and background noise. We also present a quantitative analysis to better assess the quality of the proposed algorithms. By not requiring any human intervention to produce a large amount of annotated training data, the developed methods provide Computer Vision researchers and Cultural Heritage practitioners with a compact and efficient system for document analysis.

Reference and download information

Ruggero Pintus, Ying Yang, Holly Rushmeier, and Enrico Gobbetti. Automatic Algorithms for Medieval Manuscript Analysis. In Proc. 18th International Graphonomics Society Conference, June 2017. To appear.

Related multimedia productions

Bibtex citation record

@InProceedings{Pintus:2017:AAM,
    author = {Ruggero Pintus and Ying Yang and Holly Rushmeier and Enrico Gobbetti},
    title = {Automatic Algorithms for Medieval Manuscript Analysis},
    booktitle = {Proc. 18th International Graphonomics Society Conference},
    month = {June},
    year = {2017},
    abstract = { Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field, but requires automatic tools for analyzing and indexing these datasets. We will describe a set of completely automatic solutions to estimate per-page text leading, to extract text lines, blocks and other layout elements, and to perform query-by-example word-spotting on medieval manuscripts. Those techniques have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles, languages, image resolutions, amount of illumination and ornamentation, and levels of conservation, with various problematic issues such as holes, spots, ink bleed-through, ornamentation, and background noise. We also present a quantitative analysis to better assess the quality of the proposed algorithms. By not requiring any human intervention to produce a large amount of annotated training data, the developed methods provide Computer Vision researchers and Cultural Heritage practitioners with a compact and efficient system for document analysis. },
    note = {To appear},
    url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintus:2017:AAM'},
}