Field of the Invention
This invention relates to document image processing, and in particular, it relates to methods for detecting and removing horizontal and vertical lines in a document image.
Description of Related Art
Document images typically refer to digital images representing pages of documents which contain significant amount of text. Document images often contain lines, in particular horizontal and vertical lines, such as table lines, underline for text, etc. As characters (letters and other symbols) are typically the focus of document image analysis, such as optical character recognition (OCR), document authentication, etc., it is often desired to remove the lines. These lines are usually long in one direction and may cause errors and mistakes in the connected component analysis that followed if they are not removed clearly. Various methods for line detection and removal have been proposed, such as Hough transform, run length coding, morphology analysis, etc. However, when these methods are applied on real documents, they are often affected by the image quality and how well the image is binarized. For example, improper binarization threshold may cause morphology based line detection to fail.