1. Field of Application
The present invention relates to an image recognition method and an image recognition apparatus for use in an image recognition system, for extracting from a color image the shapes of objects which are to be recognized. In particular, the invention relates to an image recognition apparatus which provides a substantial improvement in edge detection performance when applied to images such as aerial photographs or satellite images which exhibit a relatively low degree of variation in intensity values.
2. Description of Prior Art
In the prior art, various types of image recognition apparatus are known, which are intended for various different fields of application. Typically, the image recognition apparatus may be required to extract from an image, such as a photograph, all objects having a shape which falls within some predetermined category.
One approach to the problem of increasing the accuracy of image recognition of the contents of photographs is to set the camera which takes the photographs in a fixed position and to fix the lighting conditions etc., so that the photographic conditions are always identical. Another approach is to attach markers, etc., to the objects which are to be recognized.
However in the case of recognizing shapes within satellite images or aerial photographs, such prior art methods of improving accuracy cannot be applied. That is to say, the photographic conditions such as the camera position, camera orientation, weather conditions, etc., will vary each time that a photograph is taken. Furthermore, a single image may contain many categories of image data, such as image data corresponding to building, rivers, streets, etc., so that the image contents are complex. As a result, the application of image recognition to satellite images or aerial photographs is extremely difficult.
To extract the shapes of objects which are to be recognized, from the contents of an image, image processing to detect edges etc., can be implemented by using the differences between color values (typically, the intensity, i.e., gray-scale values) of the pixels which constitute a region representing an object which is to be recognized and the color values of the pixels which constitute adjacent regions to these objects. Edge detection processing consists of detecting positions at which there are abrupt changes in the pixel values, and recognizing such positions as corresponding to the outlines of physical objects. Various types of edge detection processing are known. With a typical method, smoothing processing is applied overall to the pixel values, then each of the pixels for which the first derivative of the intensity variation gradient within the image reaches a local maximum and exceeds a predetermined threshold value are determined, with each such pixel being assumed to be located on an edge of an object in the image. Alternatively, a xe2x80x9czero-crossingxe2x80x9d method can be applied, e.g., whereby the zero crossings of the second derivative of the gradient are be detected to obtain the locations of the edge pixels. With a template technique, predetermined shape templates are compared with the image contents to find the approximate positions of objects that are to be recognized, then edge detection processing may be applied to the results obtained.
Although prior art image recognition techniques are generally based upon intensity values of the pixels of an image, various methods are possible for expressing the pixel values of color image data. If the HSI (hue, saturation, intensity) color space is used, then any pixel can be specified in terms of the magnitude of its hue, saturation or intensity component. The RGB (red, green, blue) method is widely used for expressing image data, however transform processing can be applied to convert such data to HSI form, and edge detection processing can then be applied by operating on the intensity values which are thereby obtained. HSI information has the advantage of being readily comprehended by a human operator. In particular, an image can easily be judged by a human operator as having a relatively high or relatively low degree of variation in intensity (i.e., high contrast or low contrast).
Due to the difficulties which are experienced in the practical application of image recognition processing to satellite images or aerial photographs, it would be desirable to effectively utilize all of the color information that is available within such a photograph, that is to say, to use not only the intensity values of the image but also the hue and saturation information contained in the image. However in general with prior art types of edge detection processing, only parts of the color information, such as the intensity values alone, are utilized.
A method of edge detection processing is described in Japanese patent HEI 6-83962, which uses a zero-crossing method and, employing a HSI color space (referred to therein using the designations L,*C*ab,H*ab for the intensity, saturation and hue values respectively) attempts to utilize not only the intensity values but also hue and saturation information. In FIG. 47, diagrams 200, 201, 202, and 203 show respective examples of the results of image recognition, applied to a color picture of an individual, which are obtained by using that method. Diagram 200 shows the result of edge detection processing that is applied using only the intensity values of each of the pixels of the original picture, diagram 201 shows the result of edge detection processing that is applied using only the hue values, and diagram 202 shows the result obtained by using only the saturation values. Diagram 203 shows the result that is obtained by combining the results shown in diagrams 200, 201 and 203. As can be seen, a substantial amount of noise arises in the image expressed by the saturation values, and this noise is inserted into the combined image shown in diagram 203.
In some cases, image smoothing processing is applied in order to reduce the amount of noise within an image, before performing edge detection processing, i.e., the image is pre-processed by using a smoothing filter to blur the image, and edge detection processing applied to the resultant image.
In order to obtain satisfactory results from edge detection processing which is to be applied to an image such as a satellite images or aerial photograph, for example to accurately and reliably extract the shapes of specific objects such as roads, buildings etc., from the image contents, it is necessary not only to determine the degree of xe2x80x9cstrengthxe2x80x9d of each edge, but also the direction along which an edge is oriented. In the following, and in the description of embodiments of the invention and in the appended claims, the term xe2x80x9cedgexe2x80x9d is used in the sense of a line segment which is used as a straight-line approximation to a part of a boundary between two adjacent regions of a color image. The term xe2x80x9cstrengthxe2x80x9d of an edge is used herein to signify a degree of of color difference between pixels located adjacent to one side of that edge and pixels located adjacent to the opposite side, while the term xe2x80x9cedge directionxe2x80x9d is used in referring to the angle of orientation of an edge within the image, which is one of a predetermined limited number of angles. If the direction of an edge could be accurately determined (i.e., based upon only a part of the pixels which constitute that edge), then this would greatly simplify the process of determining all of the pixels which are located along that edge. That is to say, if the edge direction could be reliably determined estimated by using only a part of the pixels located on that edge, then it would be possible to compensate for any discontinuities within the edge which is obtained as a result of the edge detection processing, so that an output image could be generated in which all edges are accurately shown as continuous lines.
However with the method described in Japanese patent HEI 6-83962, only the zero-crossing method is used, so that it is not possible to determine edge directions, since only each local maximum of variation of a gradient of a color attribute is detected, irrespective of the direction along which that variation is oriented. With other types of edge detection processing such as the object template method, processing of intensity values, hue values and saturation values can be performed respectively separately, to obtain respective edge directions. However even if the results thus obtained are combined, accurate edge directions cannot be detected. Specifically, the edge directions which result from using intensity values, hue values and saturation values may be entirely different from one another, so that accurate edge detection cannot be achieved by taking the average of these results.
Moreover, in the case of a color image such as a satellite image or aerial photograph which presents special difficulties with respect to image recognition, it would be desirable to be able to flexibly adjust the image recognition processing in accordance with the overall color characteristics of the image that is to be processed. That is to say, it should be possible for example for a human operator to examine such an image prior to executing image recognition processing, to estimate whether different objects in the image mainly differ mainly with respect to differences in hue, or whether the objects are mainly distinguished by differences in gray-scale level, i.e., intensity values. The operator should then be able to adjust the image recognition apparatus to operate in a manner that is best suited to these image characteristics, i.e., to extract the edges of objects based on the entire color information of the image, but for example placing emphasis upon the intensity values of pixels, or upon the chrominance values of the pixels, whichever is appropriate. However such a type of image recognition apparatus has not been available in the prior art.
Furthermore, in order to apply image recognition processing to an image whose color data are expressed with respect to an RGB color space, it is common practice to first convert the color image data to a an HSI (hue, saturation, intensity) color space, i.e., expressing the data of each pixel as a position within such a color space. This enables a human operator to more readily judge the color attributes of the overall image prior to executing the image recognition processing, and enables such processing to be applied to only the a specific color attribute of each of the pixels, such as the intensity or the saturation attribute. However if processing is applied to RGB data which contain some degree of scattering of the color values, and a transform from RGB to HSI color space is executed, then the resultant values of saturation will be unstable (i.e., will tend to vary randomly with respect to the correct values) within those regions of the image in which the intensity values are high, and also within those regions of the image in which the intensity values are low. For example, assuming that each of the red, green and blue values of each pixel is expressed by 8bits, so that the range of values is from 0 to 255, then in the case of a region of the image in which the intensity values are low, if any of the red, green or blue values of a pixel within that region should increase by 1, this will result in a large change in the corresponding value of saturation that is obtained by the transform processing operation. Instability of the saturation values will be expressed as noise, i.e., spurious edge portions, in the results of edge detection processing which utilizes these values. For that reason it has been difficult in the prior art to utilize the color saturation information contained in a color image, in image recognition processing.
Furthermore if a substantial degree of smoothing processing is applied to an image which is to be subjected to image recognition, in order to suppress the occurrence of such noise, then this has the effect of blurring the image, causing rounding of the shapes of edges and also merging together any edges which are located closely mutually adjacent. As a result, the accuracy of extracting edge information will be reduced. Conversely, if only a moderate degree of smoothing processing is applied to the image that is to be subjected to image recognition, or if smoothing processing is not applied to the image, then the accuracy of extraction of shapes from the image will be high, but there will be a high level of noise in the results so that reliable extraction of the shapes of the required objects will be difficult to achieve.
Moreover in the prior art, there has been no simple and effective method of performing image recognition processing to extract the shapes of objects which are to be recognized, which will eliminate various small objects in the image that are not intended to be recognized (and therefore can be considered to constitute noise) without distorting the shapes of the objects which are to be recognized.
It is an objective of the present invention to overcome the disadvantages of the prior art set out above, by providing an image recognition method and image recognition apparatus whereby edge detection for extracting the outlines of objects appearing in a color image can be performed by utilizing all of the color information of the pixels of the color image, to thereby achieve a substantially higher degree of reliability of detecting those pixels which constitute edges of objects that are to be recognized than has been possible in the prior art, and furthermore to provide an image recognition method and apparatus whereby, when such an edge pixel is detected, the direction of the corresponding edge can also be detected.
It is a further objective of the invention to provide an image recognition method and image recognition apparatus whereby processing to extract the shapes of objects which are to be recognized can be performed such as to eliminate the respective shapes of small objects that are not intended to be recognized, without distorting the shapes of the objects which are to be recognized.
To achieve the above objectives, the invention provides an image recognition method and apparatus whereby, as opposed to prior art methods which are based only upon intensity values, i.e., the gray-scale values of the pixels of a color image that is to be subjected to image recognition processing, substantially all of the color information (intensity, hue and saturation information) contained in the color image can be utilized for detecting the edges of objects which are to be recognized. This is basically achieved by successively selecting each pixel to be processed, i.e., as the object pixel, and determining, for each of a plurality of possible edge directions, a vector referred to as an edge vector whose modulus indicates an amount of color difference between two sets of pixels which are located on opposing sides of the object pixel with respect to that edge direction. The moduli of the resultant set of edge vectors are then compared, and the edge vector having the largest modulus is then assumed to correspond to the most likely edge on which the object pixel may be located. That largest value of edge vector modulus is referred to as the xe2x80x9cedge strengthxe2x80x9d of the object pixel, and the direction corresponding to that edge vector is assumed to be the most likely direction of an edge on which the object pixel may be located, i.e., a presumptive edge for that pixel. Subsequently, it is judged that the object pixel is actually located on its presumptive edge if it satisifes the conditions that:
(a) its edge strength exceeds a predetermined minimum threshold value, and
(b) its edge strength is greater than the respective edge strength values of the two pixels which are located immediately adjacent to it, on opposing sides with respect to the direction of that presumptive edge.
The above processing can be achieved in a simple manner by predetermining only a limited number of possible edge directions which can be recognized, e.g., 0degrees (horizontal), 90degrees (vertical), 45 degrees diagonal and xe2x88x9245 degrees diagonal. With the preferred embodiments of the invention, a set of arrays of numeric values referred to as edge templates are utilized, with each edge template corresponding to a specific one of the predetermined edge directions, and with the values thereof predetermined such that when the color vectors of an array of pixels centered on the object pixel are subjected to array multiplication by an edge template, the edge vector corresponding to the direction of that edge template will be obtained as the vector sum of the result. The respective moduli of the edge vectors thereby derived for each of the possible edge directions are then compared, to find the largest of these moduli, as the edge strength of the object pixel.
In that way, since all of the color information contained in the image can be utilized to perform edge detection, the detection can be more accurately and reliably performed than has been possible in the prior art.
According to another aspect of the invention, data expressing the color attributes of pixels of a color image which is to be subjected to edge detection processing are first subjected to transform processing to express the color attributes of the pixels of the image as respective sets of coordinates of an appropriate color space, in particular, a color space in which intensity and chrominance information are expressed by separate coordinates. This enables the color attribute information to be modified prior to performing edge detection, such as to optimize the results that will be obtained in accordance with the characteristics of the particular color image that is being processed. That is to say, the relative amount of contribution of the intensity values to the magnitudes of the aforementioned color vectors can be increased, for example. If the color attributes are first transformed into a HSI (hue, saturation, intensity) color space, then since such HSI values are generally expressed in polar coordinates, a simple conversion operation is applied to each set of h, s, i values of each pixel to express the color attributes as a color vector of an orthogonal color space in which saturation information and chrominance information are expressed along respectively different coordinate axes, i.e. to express the pixel color attributes as a plurality of linear coordinates of that color space, and the edge detection processing is then executed.
It is known that when image data are transformed from a form such as RGB color values into an HSI color space, instability (i.e., random large-scale variations) may occur in the saturation values which are obtained as a result of the transform. This instability of saturation values is most prevalent in those regions of a color image where the intensity values are exceptionally low, and also in those regions where the intensity values are exceptionally high. This is a characteristic feature of such a transform operation, and causes noise to appear in the results of edge detection that is applied to such HSI-transformed image data and utilizes the saturation information, due to the detection of spurious edge portions as a result of abrupt changes in saturation values between adjacent pixels. However with the present invention, such instability of the saturation values can be reduced, by modifying the saturation values obtained for respective pixels in accordance with the magnitudes of the intensity values which are derived for these pixels. The noise which would otherwise be generated by such instability of saturation values can thereby be suppressed, enabling more reliable recognition of objects in the color image to be achieved.
According to one aspect of the invention, when a transform into coordinates of the HSI space has been executed, such reduction of instability of the saturation values is then achieved by decreasing the saturation values in direct proportion to amounts of decrease in the intensity values. Alternatively, that effect is achieved by decreasing the saturation values in direct proportion to decreases in the intensity values from a median value of intensity towards a minimum value (i.e., black) and also decreasing the saturation values in direct proportion to increases in the intensity values from that median value towards a maximum value (i.e., white).
According to another aspect of the invention, when a transform into coordinates of the HSI space has been executed, such reduction of instability of the saturation values is then achieved by utilizing a predetermined saturation value modification function (which varies in a predetermined manner in accordance with values of intensity) to modify the saturation values. In the case of a transform from the RGB color space to the HSI color space, that saturation value modification function is preferably derived based on calculating, for each of the sets of r, g, b values expressing respective points in the RGB color space, the amount of actual change which occurs in the saturation value s of the corresponding HSI set of transformed h, s, i values in response to a small-scale change in one of that set of r, g, b values. In that way, a saturation value modification function can be derived which is based on the actual relationship between transformed intensity values and instability of the corresponding saturation values, and can thus be used such as to maintain the saturation values throughout a color image at a substantially constant level, i.e., by varying the saturation values in accordance with the intensity values such as to appropriately compensate in those regions of the color space in which instability of the saturation values can occur.
Noise in the edge detection results, caused by detection of spurious edge portions, can be thereby very effectively suppressed, enabling accurate edge detection to be achieved.
According to another aspect, the invention provides an image recognition method and apparatus for operating on a region image (i.e., an image formed of a plurality of regions expressing the shapes of various objects, each region formed of a continuously extending set of pixels in which each pixel is identified by a label as being contained in that region) to process the region image such as to reduce the amount of noise caused by the presence of various small regions, which are not required to be recognized. This is achieved by detecting each small region having an area that is less than a predetermined threshold value, and combining each such small region with an immediately adjacent region, with the combining process being executed in accordance with specific rules which serve to prevent distortion of the shapes of objects that are to be recognized. These rules preferably stipulate that each of the small regions is to be combined with an immediately adjacent other region which (out of all of the regions immediately adjacent to that small region) has a maximum length of common boundary line with respect to that small region. In that way, regions are combined without consideration of the pixel values (of an original color image) within the regions and considering only the sizes and shapes of the regions, whereby it becomes possible to eliminate small regions which would constitute xe2x80x9cimage noisexe2x80x9d, without reducing the accuracy of extracting the shapes of objects which are to be recognized.
The aforementioned rules for combining regions may further stipulate that the combining processing is to be executed repetitively, to operate successively on each of the regions which are below the aforementioned area size threshold value, starting from the smallest of these regions, then the next-smallest, and so on. It has been found that this provided even greater effectiveness in elimination of image noise, without reducing the accuracy of extracting the shapes of objects which are to be recognized.
Alternatively, the region combining processing may be executed on the basis that the aforementioned rules for combining regions further stipulate that, for each of the small regions which are below the aforementioned area size threshold value, the total area of the regions immediately adjacent to that small region is to be calculated, and the aforementioned combining processing is then to be executed starting with the small region for which that adjacent area total is the largest, then the small region for which the adjacent area total is the next-largest, and so on in succession for all of these small regions.
A region image, for applying such region combining processing, can for example be generated by first applying edge detection by an edge detection apparatus according to the present invention to an original color image, to obtain data expressing an edge image in which only the edges of objects appear, then defining each part of that edge image which is enclosed within a continuously extending edge as a separate region, and attaching a common identifier label to each of the pixels constituting that region.
More specifically, the present invention provides an image recognition method for processing image data of a color image which is represented as respective sets of color attribute values of an array of pixels, to successively operate on each of the pixels as an object pixel such as to determine whether that pixel is located on an edge within the color image, and thereby derive shape data expressing an edge image which shows only the outlines of objects appearing in the color image, with the method comprising steps of:
if necessary, i.e., if the color attribute values of the pixels are not originally expressed as sets of coordinates of an orthogonal color space such as an RGB (red, green, blue) color space, expressing these sets of color attribute values as respective color vectors, with each color vector defined by a plurality of scalar values which are coordinates of an orthogonal color space;
for each of a plurality of predetermined edge directions, generating a corresponding edge template as an array of respectively predetermined numeric values;
extracting an array of color vectors as respective color vectors of an array of pixels having the object pixel as the center pixel of that array;
successively applying each of the edge templates to the array of color vectors in a predetermined array processing operation, to derive edge vectors respectively corresponding to the edge directions;
comparing the respective moduli of the derived edge vectors to find the maximum modulus value, designating that maximum value as the edge strength of the object pixel and designating the edge direction corresponding to an edge vector having that maximum modulus as being a possible edge direction for the object pixel; and,
judging whether the object pixel is located on an actual edge which is oriented in the possible edge direction, based upon comparing the edge strength of the object pixel with respective values of edge strength derived for pixels which are positioned immediately adjacent to the object pixel and are on mutually opposite sides of the object pixel with respect to the aforementioned possible edge direction.
The invention further provides an image recognition method for operating on shape data expressing an original region image, (i.e., an image in which pixels are assigned respective labels indicative of various image regions in which the pixels are located) to obtain shape data expressing a region image in which specific small regions appearing in the original region image have been eliminated, with the method comprising repetitive execution of a series of steps of:
selectively determining respective regions of the original region image as constituting a set of small regions which are each to be subjected to a region combining operation;
selecting one of the set of small regions as a next small region which is to be subjected to the region combining operation;
for each of respective regions which are disposed immediately adjacent to the next small region, calculating a length of common boundary line with respect to the next small region, and determining one of the immediately adjacent regions which has a maximum value of the length of boundary line; and
combining the next small region with the adjacent region having the maximum length of common boundary line.
Data expressing a region image, to be processed by the method set out above, can be reliably derived by converting an edge image which has been generated by the preceding method of the invention into a region image.
The above features of the invention will be more clearly understood by referring to the following description of preferred embodiments of the invention