In recent years, a technique have been known in which a document is read by Optical Character Recognition (OCR) and a translation (hereinafter referred to as a “rubi”) of an original text in the document of the read image is given between lines. Japanese Patent Application No. 2009-255373 is a typical document of the technique which was filed by the same applicant as the present application.
In such a system for giving a rubi for a document image, for example, as shown in FIG. 17, in order to generate a rubi, even when an original document is slightly inclined to be scanned so that a character string L11 is inclined, or even when there is the inclined character string L11 in the original document, which is inclined along the inclined character string in view of appearance of the rubi, it is necessary to obtain an accurate inclination value of each character string in the document image. The accurate inclination value of a character string is needed also in the processing other than the generation of a rubi.
As a method for obtaining the inclination value, conventionally, for example, among coordinates of a rectangle circumscribing each character in a character string, a lower left coordinate of each rectangle, a central coordinate of each rectangle or the like is determined as a standard, for example, and a regression line is obtained by a character string unit so that an inclination of the line serves as an inclination value of the character string.
Additionally, as a method in which the regression calculation is not used, there is a method in which from coordinates of a first character and a last character in a character string, an inclination between the two characters is obtained, which serves as the inclination value of the above character string.
In the method for obtaining the inclination value of the character string in the document image as described above, there are problems described below.
For example, in a case where all characters in an original document are described in alphabetical characters, as shown in FIG. 18, heights of an upper end and a lower end of a rectangle B11 circumscribing characters are not aligned, and depending on a character arrangement, there may be a case where a true inclination K11 and an obtained inclination K12 are different from each other as shown in FIG. 19 by only performing a regression of these coordinates. In addition, the regression calculation is further required, thus having a large amount of calculation.
Furthermore, in the method of obtaining the inclination of a character string by only the first character and the last character in the character string without the regression calculation, there may be a case where a difference between the true inclination and the obtained inclination becomes larger depending on the first character and the last character.