Analysis of a document image requires the identification of orientation, script, language, and characters. The identification of orientation is the task of determining which way is up. Assuming limited skew, there are four possible orientations (i.e., 0.degree. or up, 180.degree. or down, 90.degree., and 270.degree.). Script identification is the task of determining the alphabet used in the document. Examples of common scripts include Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, and Latin (or Roman). Language identification is the task of determining the language used in the document. Different languages may use the same script (e.g., English and French both use the Latin script). Character (or word) identification is the task of converting a character (or word) image to an encoding scheme such as ASCII or Unicode.
U.S. Pat. No. 5,664,027, entitled "METHOD AND APPARATUS FOR INFERRING ORIENTATION OF LINES OF TEXT," discloses a method of determining page orientation by determining the predominant orientation of lines connecting the centers of connected components to their nearest neighbors. It is stated in U.S. Pat. No. 5,664,027 that this method will not work with connected scripts which are used in Arabic. The method of U.S. Pat. No. 5,664,027 will also have difficulty with evenly spaced characters which are used in Chinese and Japanese. The method of U.S. Pat. No. 5,664,027 produces a horizontal versus vertical answer but cannot be used to produce a right-side-up versus upside-down answer or a top-to-the-left versus top-to-the-right answer. U.S. Pat. No. 5,664,027 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,444,797, entitled "METHOD AND APPARATUS FOR AUTOMATIC CHARACTER SCRIPT DETERMINATION," discloses a method of determining the script from the connected components on a page image based upon features such as occurrences of upward facing concavities (e.g., the letters h, m, and n). Moment calculations are not specifically mentioned in U.S. Pat. No. 5,444,797. Upward concavities are relatively easy to detect if a two-dimensionally coded facsimile (fax) transmission is used and the scanned image quality is very good. The method of U.S. Pat. No. 5,444,797 would be much slower with a fax transmitted using one-dimensional coding. Much of the fax transmission today uses one-dimensional coding. Also, the latest fax compression scheme (i.e., JBIG or CCITT Standard T.82) is not readily compatible with the method of U.S. Pat. No. 5,444,797. Lastly, the method of U.S. Pat. No. 5,444,797 does not determine orientation. U.S. Pat. No. 5,444,797 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,425,110, entitled "METHOD AND APPARATUS FOR AUTOMATIC LANGUAGE DETERMINATION OF ASIAN LANGUAGE DOCUMENTS," discloses a method of distinguishing between Chinese, Japanese, and Korean by examining the distribution of density (black pixels to total pixels) within the bounding boxes encompassing connected components. The method of U.S. Pat. No. 5,425,110 is very sensitive to the boldness of the font used in a document, whereas the present invention is not. Also, U.S. Pat No. 5,425,110 does not mention determining orientation or the use of a moment calculation as does the present invention. U.S. Pat. No. 5,425,110 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,375,176, entitled "METHOD AND APPARATUS FOR AUTOMATIC CHARACTER TYPE CLASSIFICATION OF EUROPEAN SCRIPT DOCUMENTS," discloses a method of distinguishing between Latin scripts based on detecting ascenders, descenders, diacritics, and punctuation. These features are hard to distinguish accurately in a degraded document image, whereas the present invention discloses a method of distinguishing between any script contained in a document image that is, relatively, insensitive to image degradation. U.S. Pat. No. 5,375,176 is hereby incorporated by reference into the specification of the present invention.
An article by Cheng-Lin Liu et al, entitled "Extracting Individual Features from Moments for Chinese Writer Identification," published by the IEEE in 1995, discloses a particular moment calculation used on features extracted from a document image. The moment calculation is specifically tailored to the style the Chinese writing style (i.e., square characters are used and the resolution in the one direction is the same as in the other direction). The present invention is not constrained to accepting document images with only square characters. Lin et al. specifically say that "the normalization of moments [with] respect to rotation is unnecessary, and even detrimental." The present invention uses a different moment calculation that is normalized for different rotations of the document image with great effect. The present invention is also insensitive to differences in resolution between the different directions in the document image.
An article by H. Al-Yousefi et al., entitled "Recognition of Arabic Characters," published in the IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 14, No. 8, August 1992, discloses a method of using a normalized moment calculation to identify Arabic characters that is different from the normalized moment calculation used in the present invention. Al-Yousefi et al. uses there normalized moments to obtain simple measures of shape, whereas the present invention uses a different normalized moment calculation to make a feature invariant to font thickness, horizontal size, and vertical size.
An article by Chong-Huah Lo et al., entitled "Pattern Recognition Using 3-D Moments," published by the IEEE in 1990, discloses a method of using a 3-D moment calculation normalized according to a constant, the area of a surface patch, and the total length of a curve in order to recognize a pattern. The moment calculation of Lo et al. differs from the two dimensional moment calculation of the present invention.
The document image processing methods of the prior art suffer from one or more of the following problems. First, the prior art methods may not determine both the orientation and the script of the image as does the present invention. Second, the prior art methods may be sensitive to the thickness of the font used in the document image, whereas the present invention is not. Third, the prior art method may not be able to process all scripts as can the present invention. Fourth, the prior art methods may be sensitive to the image compression scheme used to transmit the document image, whereas the present invention is not. Fifth, the calculations employed by the prior art methods may be computationally intensive, whereas the calculations of the present invention are highly efficient. Sixth, the prior art methods may perform poorly if the document image is degraded due to poor contrast, multiple generation reproduction, blurring, etc., whereas the performance of the present invention is not effected as much by such degradation. Also, a document image may be transmitted in any orientation. Prior art document image processing methods may not be able to process an image of unknown orientation, whereas the present invention can. The present invention is a method of processing a document image of any orientation and presenting the document in a user-defined orientation to a user if the document is of interest to the user. The user may specify which scripts the user is interested in.