1. Field of the Invention
The present subject matter relates generally to determining an orientation of a document, and more specifically to determining an orientation of a document based on a specific feature of Korean characters included in an image of the document.
2. Related Art
When large documents must be scanned in for archiving purposes, it is quite important that all pages of the document are scanned faultlessly, since when scan errors are later detected, the original documents may no longer be available. Therefore, it is necessary to check each scan image for quality. However, checking the quality of each scanned image requires much time and effort and poses an undue burden on the personnel performing the scan job. Moreover, checking the large numbers of images is boring and error-prone.
One way of circumventing the human checking process is using an automated system that automatically checks each new scanned image and if possible corrects faulty images by relevant image processing techniques.
Conventionally, the following method has been used in order to detect the orientation (upright or inverted) of a character string. In the traditional method, character recognition is performed on the assumption that the character string is in an upright state; an evaluation value (number of points) in relation to the recognition result of each character is obtained, and an average or a like value of the evaluation values of the respective characters is calculated in order to obtain a first overall evaluation value. Subsequently, character recognition is performed on the assumption the character string is in an inverted state (rotated by 180 degrees); a second evaluation value in relation to the recognition result of each character is obtained, and an average or a like value of the respective characters is calculated in order to obtain an overall evaluation value. After this, on the basis of these two overall evaluation values, character recognition which provides a higher recognition rate is specified in order to detect whether the character string is in an upright state or in an inverted state.
Techniques that classify document script into two broad classes, Han and Latin, are disclosed in A. L. Spitz, “Determination of the Script and Language Content of Document Images,” IEEE Trans PAMI, 19(3), pp 235-245, March 1997. These techniques use upward concavity and optical density to determine the script and language. However, it does not determine the orientation of the script.
Recognizing text in an image sequence is disclosed in the U.S. Pat. No. 7,031,553 issued to Myers et al. In this patent, the text orientation is identified by projecting a set of characters at different orientations and detecting the base and top lines. Projections at a fixed number of angles are used to identify the text orientation.
However, there is no disclosure about using a specific character feature of a language to determine an orientation of a document.