1. Field of the Invention
The present invention relates to a color image processing apparatus and a pattern extraction apparatus, and is specifically applicable when a character area for a title is extracted from a color image.
2. Description of the Related Art
Recently, computers and various peripheral devices such as color printers, etc. have been developed with their prices set lower, thereby extending the fields in which color images are processed. As a result, the technology of extracting only a specific area from a color image, for example, extracting the same color areas from a color image is demanded.
This technology is requested in many fields, for example, when a color scene image picked up by a CCD camera is used as an input image to be processed to select fruits, monitor a car, check a person for security, etc. through image recognition.
When a color document image is used as an input image, a document name, a keyword, etc. are automatically extracted from an image to be used for retrieval, for example, when books are classified and managed by an automatic system in a library. In addition, the technology is also used to automatically assign a keyword, a file name, etc. based on groupware in which images are stored and shared as a database. The information is used to automatically retrieve a large number of color document images.
The conventional technology of extracting the same color areas in a color image can be a method of generating a color-analyzed image by clustering for each color the picture elements in a color image. There is also a method of extracting the same color areas in a color image using a color labeling result in an adjacency expanding method.
In addition, the technology of extracting a title from a color image can be a method of extracting a character area using a color-analyzed image.
In this method, the following processes are performed.    An enclosing rectangle of connection areas is obtained from a color-analyzed image in one color.    Enclosing rectangles are limited to specific size and shape.    An adjacent rectangle search range is set for each rectangle, and rectangles are searched for in the range. A plurality of the rectangles within the range is extracted as a group.    Rectangles having good linearity in a group are maintained.    An enclosing rectangle of a group is obtained, and a pattern of color similar to that of an area forming the group is extracted inside the enclosing rectangle.
Listed below are the documents describing the conventional technology of extracting a character area from a color document image.    Senda et al. ‘Method of Extracting Character Pattern from Color Image based on Unichromatism of Character’ published by The Japan Society of Information and Communication Research, PRU94-09, p17-24    Uehane et al. ‘Extracting Character Area from Color Image by Isochromatic Line Process’ published by The Japan Society of Information and Communication Research, PRU94-09, p9-16    Matsuo et al. ‘Extracting Unicolor Character Area from Color Document Image’ published in the 1997 Convention of The Japan Society of Information and Communication Research D-12-19    Matuo et al. ‘Extracting Character String from Scenic Image based on Shading and Color Information’ published by The Japan Society of Information and Communication Research, PRU92-121, p25-32
However, In the conventional method of extracting the areas of the same color by clustering picture elements by the color of a color image, a large number of picture elements of an entire image are clustered. Therefore, the clustering process takes a long computation time.
In addition, since the clustering process is performed on the picture elements of the entire image, it may not be able to extract areas with high precision. For example, if the first color area is positioned away from the second color area, the first color is similar to the second color, and therefore the first and second colors are classified into the same cluster, then both first and second colors may not be able to be completely covered depending on the third color generated from the cluster. Thus, an extraction result may be output with an incomplete pattern or an unclear outline.
In the conventional method of extracting areas of the same color based on the area expanding method, the colors of adjacent picture elements may indicate values whose difference is larger than a predetermined threshold depending on the definition of the difference in color between the adjacent picture element even if the colors of the adjacent picture elements appear the same to naked eyes. As a result, a hole may appear in the same area or the outline of the same color area may not be correctly extracted.
Furthermore, since only the relationship between adjacent picture elements is checked, a character area can be assigned the same label as the background area when the color around the boundary between the character area and the background area gradually changes.
In addition, in the conventional area expanding method, areas of the same color are extracted by equally applying a predetermined threshold to various color document images. Therefore, for example, when similar colors such as gray, intermediate colors, etc. are used for the character and its background, a character and the background can be frequently assigned the same label, thereby reducing the character pattern extraction precision. Otherwise, an extracted label area can be broken into pieces in a character pattern, thereby reducing the character pattern extraction precision.
On the other hand, if an area expanding method is applied to an image in, for example, 256 colors other than a full color image, a large number of small label areas are generated, thereby causing the problem of low area extraction precision.
Furthermore, in the method of extracting a character area using the conventional color-analyzed image, it is necessary to generate a color-analyzed image for the entire image in the number of colors extracted from the image. It takes a long time to generate such color-analyzed images. In addition, since each color-analyzed image is generated for the entire image, a title extracted from the image is subject to the influence of the color of an area other than the title area when the title is extracted from the image, thereby reducing the title extraction precision. Furthermore, when an enclosing rectangle of connection areas is obtained, it is necessary to process the entire image for each of the extracted color-analyzed images. Therefore, a plurality of images having the same size between length and width (in number of extracted colors) are required. Thus, the process takes a long time.
Furthermore, since enclosing rectangles are grouped for each color-analyzed image generated for the entire image, the process takes a long time, and may cause the problem that characters to be extracted can be lost if they are clustered into different color-analyzed images. In addition, since only the rectangles in a search range are extracted when they are grouped, there is the problem that small portions can slip through a group. When a pattern of a color similar to the color of a group is extracted to collect the portions which have slipped through the group, there arises the problem that the noise similar to the color in the group can be collected.