Abstract
This paper proposes a new segmentation method to separate the text from various complex document images. An automatic multilevel thresholding method, based on discriminant analysis, is utilized to recursively segment a specified block region into several layered image sub-blocks. Then the multi-layer region-based clustering method is performed to process the layered image sub-blocks to form several object layers. Hence character strings with different illuminations, non-text objects and background components are segmented into separate object layers. After performed text extraction process, the text objects with different sizes, styles and illuminations are properly extracted. Experimental results on the extraction of text strings from complex document images demonstrate the effectiveness of the proposed region-based segmentation method.
Original language | English |
---|---|
Pages (from-to) | 3063-3070 |
Number of pages | 8 |
Journal | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics |
Volume | 4 |
DOIs | |
State | Published - 1 Dec 2004 |
Event | 2004 IEEE International Conference on Systems, Man and Cybernetics, SMC 2004 - The Hague, Netherlands Duration: 10 Oct 2004 → 13 Oct 2004 |
Keywords
- Document analysis
- Image segmentation
- Multilevel thresholding
- Region-based segmentation