1. Field of Application
The present invention relates to an image recognition method and an image recognition apparatus for use in an image recognition system, for extracting from a color image the shapes of objects which are to be recognized. In particular, the invention relates to an image recognition apparatus which provides a substantial improvement in edge detection performance when applied to images such as aerial photographs or satellite images which exhibit a relatively low degree of variation in intensity values.
2. Description of Prior Art
In the prior art, various types of image recognition apparatus are known, which are intended for various different fields of application. Typically, the image recognition apparatus may be required to extract from an image, such as a photograph, all objects having a shape which falls within some predetermined category.
One approach to the problem of increasing the accuracy of image recognition of the contents of photographs is to set the camera which takes the photographs in a fixed position and to fix the lighting conditions etc., so that the photographic conditions are always identical. Another approach is to attach markers, etc., to the objects which are to be recognized.
However in the case of recognizing shapes within satellite images or aerial photographs, such prior art methods of improving accuracy cannot be applied. That is to say. the photographic conditions such as the camera position, camera orientation, weather conditions, etc., will vary each time that a photograph is taken. Furthermore, a single image may contain many categories of image data, such as image data corresponding to building, rivers, streets, etc., so that the image contents are complex. As a result, the application of image recognition to satellite images or aerial photographs is extremely difficult.
To extract the shapes of objects which are to be recognized, from the contents of an image, image processing to detect edges etc., can be implemented by using the differences between color values (typically, the intensity, i.e., gray-scale values) of the pixels which constitute a region representing an object which is to be recognized and the color values of the pixels which constitute adjacent regions to these objects. Edge detection processing consists of detecting positions at which there are abrupt changes in the pixel values, and recognizing such positions as corresponding to the outlines of physical objects. Various types of edge detection processing are known. With a typical method, smoothing processing is applied overall to the pixel values, then each of the pixels for which the first derivative of the intensity variation gradient within the image reaches a local maximum and exceeds a predetermined threshold value are determined, with each such pixel being assumed to be located on an edge of an object in the image. Alternatively, a “zero-crossing” method can be applied, e.g., whereby the zero crossings of the second derivative of the gradient are be detected to obtain the locations of the edge pixels. With a template technique, predetermined shape templates are compared with the image contents to find the approximate positions of objects that are to be recognized, then edge detection processing may be applied to the results obtained.
Although prior art image recognition techniques are generally based upon intensity values of the pixels of an image, various methods are possible for expressing the pixel values of color image data. If the HSI (hue, saturation, intensity) color space is used, then any pixel can be specified in terms of the magnitude of its hue, saturation or intensity component. The RGB (red, green, blue) method is widely used for expressing image data, however transform processing can be applied to convert such data to HSI form, and edge detection processing can then be applied by operating on the intensity values which are thereby obtained. HSI information has the advantage of being readily comprehended by a human operator. In particular, an image can easily be judged by a human operator as having a relatively high or relatively low degree of variation in intensity (i.e., high contrast or low contrast).
Due to the difficulties which are experienced in the practical application of image recognition processing to satellite images or aerial photographs, it would be desirable to effectively utilize all of the color information that is available within such a photograph, that is to say, to use not only the intensity values of the image but also the hue and saturation information contained in the image. However in general with prior art types of edge detection processing, only parts of the color information, such as the intensity values alone, are utilized.
A method of edge detection processing is described in Japanese patent HEI 6-83962, which uses a zero-crossing method and, employing a HSI color space (referred to therein using the designations L,*C*ab,H*ab for the intensity, saturation and hue values respectively) attempts to utilize not only the intensity values but also hue and saturation information. In FIG. 47, diagrams 200, 201, 202, and 203 show respective examples of the results of image recognition, applied to a color picture of an individual, which are obtained by using that method. Diagram 200 shows the result of edge detection processing that is applied using only the intensity values of each of the pixels of the original picture, diagram 201 shows the result of edge detection processing that is applied using only the hue values, and diagram 202 shows the result obtained by using only the saturation values. Diagram 203 shows the result that is obtained by combining the results shown in diagrams 200, 201 and 203. As can be seen, a substantial amount of noise arises in the image expressed by the saturation values, and this noise is inserted into the combined image shown in diagram 203.
In some cases, image smoothing processing is applied in order to reduce the amount of noise within an image, before performing edge detection processing, i.e., the image is pre-processed by using a smoothing filter to blur the image, and edge detection processing applied to the resultant image.
In order to obtain satisfactory results from edge detection processing which is to be applied to an image such as satellite images or aerial photograph, for example to accurately and reliably extract the shapes of specific objects such as roads, buildings etc., from the image contents, it is necessary not only to determine the degree of “strength” of each edge, but also the direction along which an edge is oriented. In the following, and in the description of embodiments of the invention and in the appended claims, the term “edge” is used in the sense of a line segment which is used as a straight-line approximation to a part of a boundary between two adjacent regions of a color image. The term “strength” of an edge is used herein to signify a degree of of color difference between pixels located adjacent to one side of that edge and pixels located adjacent to the opposite side, while the term “edge direction” is used in referring to the angle of orientation of an edge within the image, which is one of a predetermined limited number of angles. If the direction of an edge could be accurately determined (i.e., based upon only a part of the pixels which constitute that edge), then this would greatly simplify the process of determining all of the pixels which are located along that edge. That is to say, if the edge direction could be reliably determined estimated by using only a part of the pixels located on that edge, then it would be possible to compensate for any discontinuities within the edge which is obtained as a result of the edge detection processing, so that an output image could be generated in which all edges are accurately shown as continuous lines.
However with the method described in Japanese patent HEI 6-83962, only the zero-crossing method is used, so that it is not possible to determine edge directions, since only each local maximum of variation of a gradient of a color attribute is detected, irrespective of the direction along which that variation is oriented. With other types of edge detection processing such as the object template method, processing of intensity values, hue values and saturation values can be performed respectively separately, to obtain respective edge directions. However even if the results thus obtained are combined, accurate edge directions cannot be detected. Specifically, the edge directions which result from using intensity values, hue values and saturation values may be entirely different from one another, so that accurate edge detection cannot be achieved by taking the average of these results.
Moreover, in the case of a color image such as a satellite image or aerial photograph which presents special difficulties with respect to image recognition, it would be desirable to be able to flexibly adjust the image recognition processing in accordance with the overall color characteristics of the image that is to be processed. That is to say, it should be possible for example for a human operator to examine such an image prior to executing image recognition processing, to estimate whether different objects in the image mainly differ mainly with respect to differences in hue, or whether the objects are mainly distinguished by differences in gray-scale level, i.e., intensity values. The operator should then be able to adjust the image recognition apparatus to operate in a manner that is best suited to these image characteristics, i.e., to extract the edges of objects based on the entire color information of the image, but for example placing emphasis upon the intensity values of pixels, or upon the chrominance values of the pixels, whichever is appropriate. However such a type of image recognition apparatus has not been available in the prior art.
Furthermore, in order to apply image recognition processing to an image whose color data are expressed with respect to an RGB color space, it is common practice to first convert the color image data to a an HSI (hue, saturation, intensity) color space, i.e., expressing the data of each pixel as a position within such a color space. This enables a human operator to more readily judge the color attributes of the overall image prior to executing the image recognition processing, and enables such processing to be applied to only the a specific color attribute of each of the pixels, such as the intensity or the saturation attribute. However if processing is applied to RGB data which contain some degree of scattering of the color values, and a transform from RGB to HSI color space is executed, then the resultant values of saturation will be unstable (i.e., will tend to vary randomly with respect to the correct values) within those regions of the image in which the intensity values are high, and also within those regions of the image in which the intensity values are low. For example, assuming that each of the red, green and blue values of each pixel is expressed by 8 bits, so that the range of values is from 0 to 255, then in the case of a region of the image in which the intensity values are low, if any of the red, green or blue values of a pixel within that region should increase by 1, this will result in a large change in the corresponding value of saturation that is obtained by the transform processing operation. Instability of the saturation values will be expressed as noise, i.e., spurious edge portions, in the results of edge detection processing which utilizes these values. For that reason it has been difficult in the prior art to utilize the color saturation information contained in a color image, in image recognition processing.
Furthermore if a substantial degree of smoothing processing is applied to an image which is to be subjected to image recognition, in order to suppress the occurrence of such noise, then this has the effect of blurring the image, causing rounding of the shapes of edges and also merging together any edges which are located closely mutually adjacent. As a result, the accuracy of extracting edge information will be reduced. Conversely, if only a moderate degree of smoothing processing is applied to the image that is to be subjected to image recognition, or if smoothing processing is not applied to the image, then the accuracy of extraction of shapes from the image will be high, but there will be a high level of noise in the results so that reliable extraction of the shapes of the required objects will be difficult to achieve.
Moreover in the prior art, there has been no simple and effective method of performing image recognition processing to extract the shapes of objects which are to be recognized, which will eliminate various small objects in the image that are not intended to be recognized (and therefore can be considered to constitute noise) without distorting the shapes of the objects which are to be recognized.