ATHENA: Automatic Text Height ExtractioN for the Analysis of text lines in old handwritten manuscripts
Ruggero Pintus, Ying Yang, and Holly Rushmeier
2015
Abstract
Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field, but requires automatic tools for analyzing and indexing these datasets. We present two completely automatic methods requiring no human intervention: text height estimation and text line extraction. Our proposed methods have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles and with various problematic attributes, such as holes, spots, ink bleed-through, ornamentation, background noise, and overlapping text lines. Our experimental results demonstrate that these two new methods are efficient and reliable, even when applied to very noisy and damaged old handwritten manuscripts.
Reference and download information
Ruggero Pintus, Ying Yang, and Holly Rushmeier. ATHENA: Automatic Text Height ExtractioN for the Analysis of text lines in old handwritten manuscripts. ACM Journal on Computing and Cultural Heritage (JOCCH), 8(1): 1:1-1:25, 2015. DOI: 10.1145/2659020.
Related multimedia productions
Bibtex citation record
@Article{Pintus:2014:ATH, author = {Ruggero Pintus and Ying Yang and Holly Rushmeier}, title = {{ATHENA}: Automatic Text Height ExtractioN for the Analysis of text lines in old handwritten manuscripts}, journal = {ACM Journal on Computing and Cultural Heritage (JOCCH)}, volume = {8}, number = {1}, pages = {1:1--1:25}, year = {2015}, abstract = { Massive digital acquisition and preservation of deteriorating historical and artistic documents is of particular importance due to their value and fragile condition. The study and browsing of such digital libraries is invaluable for scholars in the Cultural Heritage field, but requires automatic tools for analyzing and indexing these datasets. We present two completely automatic methods requiring no human intervention: text height estimation and text line extraction. Our proposed methods have been evaluated on a huge heterogeneous corpus of illuminated medieval manuscripts of different writing styles and with various problematic attributes, such as holes, spots, ink bleed-through, ornamentation, background noise, and overlapping text lines. Our experimental results demonstrate that these two new methods are efficient and reliable, even when applied to very noisy and damaged old handwritten manuscripts. }, doi = {10.1145/2659020}, url = {http://vic.crs4.it/vic/cgi-bin/bib-page.cgi?id='Pintus:2014:ATH'}, }
The publications listed here are included as a means to ensure timely
dissemination of scholarly and technical work on a non-commercial basis.
Copyright and all rights therein are maintained by the authors or by
other copyright holders, notwithstanding that they have offered their works
here electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the
explicit permission of the copyright holder.
Please contact the authors if you are willing to republish this work in
a book, journal, on the Web or elsewhere. Thank you in advance.
All references in the main publication page are linked to a descriptive page
providing relevant bibliographic data and, possibly, a link to
the related document. Please refer to our main
publication repository page for a
page with direct links to documents.