TY - GEN
T1 - Rule-based page segmentation for palm leaf manuscript on color image
AU - Inkeaw, Papangkorn
AU - Bootkrajang, Jakramate
AU - Charoenkwan, Phasit
AU - Marukatat, Sanparith
AU - Ho, Shinn-Ying
AU - Chaijaruwanich, Jeerayut
N1 - Publisher Copyright:
© Springer International Publishing AG 2016.
PY - 2016
Y1 - 2016
N2 - Palm leaf manuscripts are important source of history and ancient wisdom. Large number of manuscripts have been already digitized in the form of folio images. To extract useful information, an optical character recognition (OCR) is often considered to be the first step towards text mining. Unfortunately, folio images contain multiple unsegmented palm leaf images, making it difficult to manage in OCR process. This motivates us to propose a new page segmentation method for palm leaf manuscripts. This method consists of two main steps, first of which is the detection of objects in folio images using Connected Component Labeling method in a transformed L*a*b* color space. The second step is rule-based selection of objects as either palm leaf or not palm leaf. The experiments performed on 20 publicly available palm leaf manuscripts composed of 384 folio images demonstrated that the proposed method effectively segmented folio images into separate palm leaf images, with 99.86% precision and 96.67% recall scores.
AB - Palm leaf manuscripts are important source of history and ancient wisdom. Large number of manuscripts have been already digitized in the form of folio images. To extract useful information, an optical character recognition (OCR) is often considered to be the first step towards text mining. Unfortunately, folio images contain multiple unsegmented palm leaf images, making it difficult to manage in OCR process. This motivates us to propose a new page segmentation method for palm leaf manuscripts. This method consists of two main steps, first of which is the detection of objects in folio images using Connected Component Labeling method in a transformed L*a*b* color space. The second step is rule-based selection of objects as either palm leaf or not palm leaf. The experiments performed on 20 publicly available palm leaf manuscripts composed of 384 folio images demonstrated that the proposed method effectively segmented folio images into separate palm leaf images, with 99.86% precision and 96.67% recall scores.
KW - Lab color space
KW - Page segmentation
KW - Palm leaf manuscripts
KW - Rule-based selection
UR - http://www.scopus.com/inward/record.url?scp=85005952253&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-49304-6_16
DO - 10.1007/978-3-319-49304-6_16
M3 - Conference contribution
AN - SCOPUS:85005952253
SN - 9783319493039
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 127
EP - 136
BT - Digital Libraries
A2 - Morishima, Atsuyuki
A2 - Rauber, Andreas
A2 - li Liew, Chern
PB - Springer Verlag
T2 - 18th International Conference on Asia-Pacific Digital Libraries, ICADL 2016
Y2 - 7 December 2016 through 9 December 2016
ER -