1. Field of the Invention
The present invention relates to a color image processing apparatus and a pattern extraction apparatus, and is specifically applicable when a character area for a title is extracted from a color image.
2. Description of the Related Art
Recently, computers and various peripheral devices such as color printers, etc. have been developed with their prices set lower, thereby extending the fields in which color images are processed. As a result, the technology of extracting only a specific area from a color image, for example, extracting the same color areas from a color image is demanded.
This technology is requested in many fields, for example, when a color scene image picked up by a CCD camera is used as an input image to be processed to select fruits, monitor a car, check a person for security, etc. through image recognition.
When a color document image is used as an input image, a document name, a keyword, etc. are automatically extracted from an image to be used for retrieval, for example, when books are classified and managed by an automatic system in a library. In addition, the technology is also used to automatically assign a keyword, a file name, etc. based on groupware in which images are stored and shared as a database. The information is used to automatically retrieve a large number of color document images.
The conventional technology of extracting the same color areas in a color image can be a method of generating a color-analyzed image by clustering for each color the picture elements in a color image. There is also a method of extracting the same color areas in a color image using a color labeling result in an adjacency expanding method.
In addition, the technology of extracting a title from a color image can be a method of extracting a character area using a color-analyzed image.
In this method, the following processes are performed.
An enclosing rectangle of connection areas is obtained from a color-analyzed image in one color.
Enclosing rectangles are limited to specific size and shape.
An adjacent rectangle search range is set for each rectangle, and rectangles are searched for in the range. A plurality of the rectangles within the range is extracted as a group.
Rectangles having good linearity in a group are maintained.
An enclosing rectangle of a group is obtained, and a pattern of color similar to that of an area forming the group is extracted inside the enclosing rectangle.
Listed below are the documents describing the conventional technology of extracting a character area from a color document image.
Senda et al. xe2x80x98Method of Extracting Character Pattern from Color Image based on Unichromatism of Characterxe2x80x99 published by The Japan Society of Information and Communication Research, PRU94-09, p17-24
Uehane et al. xe2x80x98Extracting Character Area from Color Image by Isochromatic Line Procexe2x80x99 published by The Japan Society of Information and Communication Research, PRU94-09, p9-16
Matsuo et al. xe2x80x98Extracting Unicolor Character Area from Color Document Imagexe2x80x99 published in the 1997 Convention of The Japan Society of Information and Communication Research D-12-19
Matuo et al. xe2x80x98Extracting Character String from Scenic Image based on Shading and Color Informationxe2x80x99 published by The Japan Society of Information and Communication Research, PRU92-121, p25-32
However, In the conventional method of extracting the areas of the same color by clustering picture elements by the color of a color image, a large number of picture elements of an entire image are clustered. Therefore, the clustering process takes a long computation time.
In addition, since the clustering process is performed on the picture elements of the entire image, it may not be able to extract areas with high precision. For example, if the first color area is positioned away from the second color area, the first color is similar to the second color, and therefore the first and second colors are classified into the same cluster, then both first and second colors may not be able to be completely covered depending on the third color generated from the cluster. Thus, an extraction result may be output with an incomplete pattern or an unclear outline.
In the conventional method of extracting areas of the same color based on the area expanding method, the colors of adjacent picture elements may indicate values whose difference is larger than a predetermined threshold depending on the definition of the difference in color between the adjacent picture element even if the colors of the adjacent picture elements appear the same to naked eyes. As a result, a hole may appear in the same area or the outline of the same color area may not be correctly extracted.
Furthermore, since only the relationship between adjacent picture elements is checked, a character area can be assigned the same label as the background area when the color around the boundary between the character area and the background area gradually changes.
In addition, in the conventional area expanding method, areas of the same color are extracted by equally applying a predetermined threshold to various color document images. Therefore, for example, when similar colors such as gray, intermediate colors, etc. are used for the character and its background, a character and the background can be frequently assigned the same label, thereby reducing the character pattern extraction precision. Otherwise, an extracted label area can be broken into pieces in a character pattern, thereby reducing the character pattern extraction precision.
On the other hand, if an area expanding method is applied to an image in, for example, 256 colors other than a full color image, a large number of small label areas are generated, thereby causing the problem of low area extraction precision.
Furthermore, in the method of extracting a character area using the conventional color-analyzed image, it is necessary to generate a color-analyzed image for the entire image in the number of colors extracted from the image. It takes a long time to generate such color-analyzed images. In addition, since each color-analyzed image is generated for the entire image, a title extracted from the image is subject to the influence of the color of an area other than the title area when the title is extracted from the image, thereby reducing the title extraction precision. Furthermore, when an enclosing rectangle of connection areas is obtained, it is necessary to process the entire image for each of the extracted color-analyzed images. Therefore, a plurality of images having the same size between length and width (in number of extracted colors) are required. Thus, the process takes a long time.
Furthermore, since enclosing rectangles are grouped for each color-analyzed image generated for the entire image, the process takes a long time, and may cause the problem that characters to be extracted can be lost if they are clustered into different color-analyzed images. In addition, since only the rectangles in a search range are extracted when they are grouped, there is the problem that small portions can slip through a group. When a pattern of a color similar to the color of a group is extracted to collect the portions which have slipped through the group, there arises the problem that the noise similar to the color in the group can be collected.
The present invention aims at providing an image processing apparatus capable of extracting unicolor areas with high precision from various color images.
To solve the above described problems, according to the present invention, images are processed based on the number of colors of a color image to be processed.
Thus, images can be optimally processed depending on the number of colors of a target color image, thereby improving the precision in the image process and performing the process at a higher speed.
According to an aspect of the present invention, a different labeling method can be selected based on the number of colors of a target color image.
Therefore, even if the color difference in an area of a color image having a smaller number of colors is large to some extent, it can be assumed that the area is in the same color, thereby preventing a unicolor area from being fragmented into very small sections. In addition, a very small color difference can be detected in a color image having a large number of colors so that a different label can be assigned to each of the areas in different colors, thereby discriminating different color patterns with high precision, and extracting only a pattern in a specified color with high precision.
According to another aspect of the present invention, a label is assigned to a color image other than a full-color image after clustering color palettes, while a label is assigned to a full-color image in an adjacency expanding method.
Thus, since color images other than full-color images has a smaller number of colors, they can be processed in a shorter time even when the color palette clustering process is performed. In addition, even if a unicolor area appears uneven in color, the contained colors are classified into the same cluster, thereby preventing the loss of any color. As a result, the unicolor area can be extracted with high precision. As for a full-color image, a unicolor area can be extracted only by comparing the colors of adjacent picture elements without clustering colors, the processing time can be shortened, and a unicolor area can be extracted without an influence of the color of a separate area, thereby improving the extraction precision.
According to a further aspect of the present invention, a labeling threshold is individually set for each image according to the read information about an image to be labeled.
Thus, even if the ranges of the unicolor of images are different from each other, a threshold correctly reflecting the variations of the unicolor of an image can be set. As a result, the unicolor areas can be extracted with high precision for various color images.
According to a further aspect of the present invention, a labeling threshold for an input image to be processed is set by extracting color difference information from a local area of the input image.
Thus, the actual color difference in the unicolor area of an input image can be extracted from the input image from which a unicolor area is to be extracted, and a threshold unique to the input image can be set. Therefore, even if a various color image is input, the unicolor area can be extracted with high precision.
According to a further aspect of the present invention, a color image is sectioned in mesh form. In the mesh area, an area indicating small variance of colors is extracted as a uniform color area of the color image.
Thus, the position of an area indicating the same color can be specified in the color image in which various colors are distributed. The actual color difference of the unicolor area of an input image can be computed by obtaining the variance of the color of the area.
According to a further aspect of the present invention, a labeling threshold is determined based on the standard deviation of the color in a local area for which a color variance value is within a predetermined range.
As a result, the range of the same color can be obtained from an actual image to be processed. Even if similar colors such as gray, intermediate colors, etc. are used for both characters and background, the characters and background in similar colors can be correctly discriminated, and only characters can be extracted with high precision.
According to a further aspect of the present invention, the color is changed according to a color signal such that the resolution of the color difference of the first color recognized with the naked eyes matches the resolution of the color difference of the second color recognized with the naked eyes.
Thus, even if the naked eyes recognize the same color although there is a large color difference in the color space, the areas can be collectively extracted. On the other hand, when the naked eyes recognize different colors even if there is a small color difference in the color space, the areas can be individually extracted. Thus, similar color areas can be extracted depending on the color recognition characteristics of the naked eyes.
According to a further aspect of the present invention, the color difference around the color of low color saturation is reduced, and the color difference around the color of high color saturation is expanded.
As described above, around a color of low color saturation where naked eyes have low color resolution, an area recognized as in uniform color by the naked eyes can be extracted as a uniform color area with high precision by a device. On the other hand, around a color of high color saturation where naked eyes have high color resolution, an area recognized as in different colors by the naked eyes can be extracted as different areas by a device. Thus, areas in the similar color can be extracted with higher precision.
According to a further aspect of the present invention, the colors of a color image are clustered, and the same label is assigned to the areas connected by the colors belonging to the same cluster.
Thus, when a labelling process is performed in the adjacency expanding method, the number of colors of a color image is reduced; and the labelling process is performed without a predetermined labeling threshold, thereby more quickly performing the process with higher extraction precision for a unicolor area.
According to a further aspect of the present invention, a threshold for use in extracting a unicolor area from a color image is set based on the read resolution independently computed for each color component.
Thus, a unicolor area can be extracted with the cases taken into account where the read resolution of a CCD, a scanner, etc. depends of each color component, and where the resolution with the naked eyes depends on the difference in color of a color image. As a result, the extraction precision of a unicolor pattern from a color image can be improved.
According to a further aspect of the present invention, the read resolution corresponding to the matching color difference between adjacent picture elements obtained from the input image is individually obtained for each of the three primary colors from the color difference table which stores the maximum value of the color difference between adjacent picture elements using the luminance value and the read resolution as variables. Based on the read resolution of the three primary colors, the read resolution of the input image can be computed.
Thus, since the difference in read resolution for each color component can be taken into account when the read resolution of an input image is computed, the extraction precision of a unicolor pattern from a color image can be successfully improved.
According to a further aspect of the present invention, the maximum value of the color difference between adjacent picture elements is entered in the color difference table corresponding to the luminance values of all colors of an image.
Thus, the maximum value of the color difference between adjacent picture elements can be obtained directly from the color difference table without an arithmetic operation such as interpolation for any luminance value of the color of an image. As a result, a labeling threshold corresponding to the luminance color of the color of an image can be quickly obtained.
According to a further aspect of the present invention, the length of the outline of a pattern in an image is computed based on the frequency of changes of a label value when the image is scanned in a predetermined direction.
Thus, the outline length computing process can be quickly performed on a pattern whose outline length is to be computed only by once scanning the range of the enclosing rectangle of the pattern.
According to a further aspect of the present invention, the number of picture elements which change in the scanning direction from the label other than the first label to the first label is counted, and the number of picture elements which change from the first label to the label other than the first label is counted after two continuous picture elements having the first label in the scanning direction. Then, among the picture elements assigned the first label, the number of picture elements whose adjacent picture elements in the scanning direction are both assigned the first label, and at least one of whose adjacent picture elements in the scanning or vertical direction is assigned a level other than the first label is counted.
Thus, when the edge of a pattern is detected and the outline length is computed, the edge can be detected as the outline of the pattern continuing in the scanning direction. For a pattern having the width of one picture element, the outline can be prevented from being counted twice, thereby correctly computing in one scanning operation the outline length of a pattern in variable shape.
According to a further aspect of the present invention, it is determined whether or not the area of a unicolor group is a character area based on the character recognition result of patterns belonging to the unicolor group.
Thus, even if a pattern having noise has been extracted as a candidate for a title, the pattern having noise can be removed from the candidates for a title, thereby improving the extraction precision for a title area.
According to a further aspect of the present invention, the patterns in the same group can be classified again based on the range of the thickness of a pattern set on the frequencies of the thicknesses of the patterns in the same group.
Thus, even if patterns of various thicknesses coexist, patterns of the same thickness can be classified into the same group, thereby improving the extraction precision for a title area.
According to a further aspect of the present invention, a first color group and a second color group are integrated based on the shape, size, or positional relation of the enclosing rectangle of the first color group and the second color group.
Thus, if the shape, size, or positional relation of the enclosing rectangles is appropriate for a title area, then the groups can be classified as belonging to the same group. Therefore, even if characters forming a title contain a character in a different color, the title area can be precisely extracted.
According to a further aspect of the present invention, when enclosing rectangles overlapping each other are integrated, a pattern in a specific shape can be removed.
Thus, a pattern not to be extracted can be removed from the patterns to be processed, thereby extracting a unicolor pattern with high precision.
According to a further aspect of the present invention, enclosing rectangles overlapping each other can be integrated after removing the patterns in the L or] shape.
Thus, if a character to be extracted and the background encompassing the character are of the same color, and even if only the corner of the background is extracted as an area having the same color as the character to be extracted, then the corner of the background can be prevented from being integrated into the character to be extracted, thereby precisely extracting the title area.
According to a further aspect of the present invention, enclosing rectangles are grouped by comparing the color information about the patterns in the enclosing rectangles to be grouped with the color information about a group of already grouped enclosing rectangles.
Thus, enclosing rectangles can be grouped in consideration of the entire color of an area to be extracted, and an area which has already been extracted. As a result, even when the colors of the patterns in an enclosing rectangle gradually change, an area having a different color from that of the area to be extracted can be prevented from being classified into the same group.
According to a further aspect of the present invention, a threshold for use in determining whether or not a color is similar to a specific color is set according to the color information about a pattern classified into a unicolor group.
Thus, a threshold for use in determining a unicolor pattern can be obtained from the change of the color of a unicolor pattern to be extracted. Therefore, even if a unicolor range depends on each pattern, a threshold reflecting the color change of a pattern can be set. Therefore, a unicolor area can be extracted with high precision.
According to a further aspect of the present invention, groups can be integrated according to the color information about the patterns classified as a unicolor group.
Thus, even if the color of the pattern in an enclosing rectangle to be grouped has locally changed, the local color change of the pattern can be absorbed in the color of the entire patterns of the area from which the change has already been extracted, thereby extracting totally unicolor patterns with high precision.