Handwritten Documents Text Line Segmentation based on Information Energy
Keywords:
text line segmentation, text recognition, information energy, OCRAbstract
The first step in the text recognition process is represented by the text line segmentation procedures. Only after text lines are correctly identified can the process proceed to the recognition of individual characters. This paper proposes a line segmentation algorithm based on the computation of an information content level, called energy, for each pixel of the image and using it to execute the seam carving procedure. The algorithm proposes the identification of text lines which follow the text more accurately with the expected downside of the computational overhead.
References
dos Santos, R.P. et al (2009), Text Line Segmentation Based on Morphology and Histogram Projection, Document Analysis and Recognition (ICDAR), pp. 651- 655.
Saha, S. et al (2010), A Hough Transform based Technique for Text Segmentation, Journal of Computing, vol. 2, no. 2. Arivazhagan, M. et al (2007), A Statistical approach to line segmentation in handwritten documents, Proceedings of SPIE.
Strand, L. et al (2007), Minimal Cost-Path for Path-Based Distances, Image and Signal Processing and Analysis, pp. 379-384.
Avidan, S. et al (2007), Seam Carving for Content-Aware Image Resizing, ACM Siggraph, article 10.
Saabni, S. et al (2001), Language-Independent Text Lines Extraction Using Seam Carving, Document Analysis and Recognition (ICDAR), pp. 563-568.
Papavassiliou, V. et al (2010), Handwritten document image segmentation into text lines and words, Pattern Recognition, vol. 43, no 1, pp. 369-377. http://dx.doi.org/10.1016/j.patcog.2009.05.007
Tripathy, N.; Pal, U. (2004), Handwriting segmentation of unconstrained Oriya text, Frontiers in Handwriting Recognition, pp. 306-311.
Kennard, D.J., Barrett, W.A. (2006), Separating Lines of Text in Free-Form Handwritten Historical Documents, Document Image Analysis for Libraries, pp. 12-23.
Asi, A. et al (2011), Text Line Segmentation for Gray Scale Historical Document Images, Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, 120-126 http://dx.doi.org/10.1145/2037342.2037362
Bar-Yosef, I. (2005), Input sensitive thresholding for ancient Hebrew manuscript, Pattern Recognition Letters, vol. 26, no. 8, pp. 1168-1173. http://dx.doi.org/10.1016/j.patrec.2004.07.014
Bar-Yosef, I. et al (2009), Line segmentation for degraded handwritten historical documents, Document Analysis and Recognition, pp. 1161-1165.
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.