Conventionally, the direction of an original image inputted into a computer using a scanner or the like is detected by the following methods:
(1) Detection of Original Image Direction by Software
FIG. 30 shows the outline of software processing for detecting the direction of an image. As shown in FIG. 30, first, a color image 1011 as the object of direction detection is binarized by a binarization processing procedure 1012, and a binary image 1013 is generated. Next, the binary image 1013 is area-divided by an area division processing procedure 1014, and character coordinate information 1015 as coordinate information of a character area is generated. Then, the binary image 1013 in the character coordinate information 1015 is referred to by an OCR processing procedure 1016 thus character recognition processing is performed, and the result of direction detection of the color image 1011 is outputted.
The above processing will be described in more detail. FIG. 31 shows an example of the respectively 8-bit RGB color image 1011. This color image is simply binarized with a fixed threshold value 128 by luminance conversion. FIG. 32 shows an example of the binary image simply binarized from the color image in FIG. 31. As the binary image in FIG. 32 obtained by simple binarization includes a large amount of noise, excellent area division cannot be performed.
Accordingly, in the binarization processing procedure 1012, a histogram of luminance information of the color image is created as shown in FIG. 33, and an optimum binarization point 1041 is calculated. FIG. 33 shows the luminance information of the color image in FIG. 31 and the optimum binarization point 1041. Further, FIG. 34 shows a binary image from the color image 1011 using the binarization point 1041. As shown in FIG. 34, as a threshold value using the histogram of luminance information (binarization point 1041) is used in place of the fixed threshold value, the amount of noise included in the binary image in FIG. 34 is smaller in comparison with the binary image in FIG. 32, and excellent area division can be performed.
FIG. 35 shows an example of the result of area division performed on the binary image in FIG. 34 by the area division processing procedure 1014. In the area division processing, the resolution is reduced so as to connect black pixels, then edge line tracing is performed, and it is determined whether or not the traced profile is a character from the shape of the profile. In FIG. 35, rectangular areas 1061 to 1068 are determined as character areas. Note that areas 1067 and 1068 are erroneously determined areas.
As described above, in the OCR processing procedure 1016, the areas determined as the character areas are read and character-cutting processing is performed, and direction detection processing is performed on each character. In the direction detection processing, a feature vector of 1 character is calculated, and the feature vector is rotated and character recognition processing in 4 (0°, 90°, 180° and 270°) directions. From the results of character recognition in the 4 directions, one direction with the highest accuracy is determined as the result of direction detection.
In image direction detection by software processing, a value obtained by addition of OCR results of all the characters existing the original (characters existing in the character areas resulted from the area division), is outputted as the final result.
(2) Detection of Original Image Direction by Hardware
Next, the outline of hardware construction for the conventional image direction detection processing will be described. FIG. 36 is a block diagram showing the hardware construction of a specialized direction detection board connected to a main board of a monochrome digital copier. In FIG. 36, numeral 1071 denotes a character extraction unit having a specialized GA for character extraction processing and binarization processing; 1072, a RAM; 1073, a CPU; and 1074, a ROM.
FIG. 37 is a timing chart showing respective operations using the direction detection board in FIG. 36. Next, the operations of the direction detection board in FIG. 36 will be described with reference to the timing chart of FIG. 37.
In FIG. 37, “0” to “3” are page numbers of original images placed on an ADF (Automatic Direction Finder) of the monochrome digital copier. Further, numeral 1081 indicates timing of original reading by a scanner of the copier; 1082, timing of the character extraction processing and the binarization processing by the specialized character extraction GA; 1083, timing of direction determination OCR by the CPU; and 1084, timing of outputting the result of direction determination. As shown in FIG. 37, the results of processing of the respective pages are outputted by 2-pipeline delay from input of original.
First, the scanner sequentially reads the originals placed on the ADF, and VIDEO in FIG. 36 is generated. Note that VIDEO includes a CLK and image data (8 bits) synchronized with the CLK, a page signal indicating a page break of the image data, and a main-scanning synchronizing signal indicating a break of width of the image data.
The character extraction unit 1071 inputs the image data (8 bits), detects an area likely to be a character area including continuous image data (more specifically, refers to adjacent plural pixels, and detects an area where the difference between a maximum and minimum value is greater than a threshold value) and generates coordinate data thereof. Further, the character extraction unit 1071 binarizes the image data (8 bits). Note that a threshold value used in the binarization is determined from a histogram of a previous line. Then, the coordinate data and the binary image are written into the RAM 1072 (Note that the GA may have a RAM for storing the data). The above operation is performed at the timing indicated with the numeral 1082 in FIG. 37.
Next, the CPU 1073 performs the direction determination OCR processing in parallel to the character extraction processing on the next page. The CPU 1073 reads the coordinate data from the RAM 1072 in accordance with a program stored in the ROM 1074, and performs the direction determination OCR processing on the binary image on the RAM 1072 corresponding to the coordinate data. Note that in a case where the RAM has a sufficient size, the program on the ROM 1074 is downloaded to the RAM 1072 for increasing the processing speed. The character extraction in synchronization with the CLK is fixed time processing, however, as the direction determination OCR is time-variable processing, the OCR processing is forcibly terminated by utilizing a timer. The results of direction detection within a limited period by the timer (0, 90, 180, 270 and UNKNOWN) are outputted at timing indicated with “Δ” at the timing 1084.
However, the above-described conventional methods have the following problems.
(1) Problems in Detection of Original Image Direction by Software
(1-1) Much Processing Time is Required.
Hereinbelow, described is a result of measurement of an A4 size image processed by a personal computer having a 266 MHz Pentium (registered trademark) II. First, it takes 1.8 seconds to create a histogram and an calculate an optimum binarization point. Next, it takes 0.3 to 1.0 seconds to perform the area division processing, although the time of the processing varies in accordance with image (depending on the number of connected black pixels). Then it takes 2 to 3 seconds to perform the OCR processing on the document original mainly including characters, although the time of the processing varies in accordance with the number of characters. Accordingly, total 4 to 5 seconds are required.
(1-2) A large Amount of Work Memory is Required.
As the entire color image is referred to so as to obtain an optimum binary image, in case of A4 size image, a 24 MBytes memory is required.
(2) Problems in Detection of Original Image Direction by Hardware
(2-1) Cost is High.
As the specialized board is utilized and the CPU, the RAM, the ROM, the character extraction GA, a control GA (not shown) and the like are necessary only for the direction determination processing, the cost is high.
(2-2) Version Updating is Difficult
As the character extraction unit is comprised of a specialized GA, version updating of character extraction algorithm cannot be made without difficulty.
(3) Problem Common to Both Detection Methods
In both methods, it is impossible to perform the OCR processing on an inverted character portion. In recent years, color office documents are widely used as well as printed documents, and the color images often include more inverted character portions in comparison with monochrome originals. Accordingly, in both methods, the accuracy of recognition is low in color images having inverted character portions.