1. Field of the Invention
This invention relates to an image processing apparatus and method. More particularly, the invention relates to an image processing apparatus and method for high-efficiency encoding and recognition of images, as well as to an image processing apparatus and method for detecting the feature points of-an image.
2. Description of the Related Art
Many image encoding techniques have been developed on the basis of filtering theory, which is based upon analysis of frequency regions, and information-source encoding theory. Since the transformation process and inverse-transformation process of these theories have structures that are symmetrical to each other, the inverse transformation is self-evident and it is not required that one be aware of the inverse-transformation process. For this reason, the conventional encoding is referred to as “symmetric encoding”.
Symmetric encoding will be described in brief. First, the performance demanded by symmetric encoding is as follows:                (1) Universality: It is required that various types of image information be encodable efficiently.        (2) Compatibility: It is required that compatibility between image media having different resolutions be assured. To this end, there is need of a scheme through which bit strings for various image media having different resolutions are obtained if part of an encoded bit string is extracted.        (3) Fidelity: It is required that a reproduced image be close to the original image. This involves establishing the concept of distance between images for the purpose of evaluating fidelity.        
Rapid progress in symmetric encoding has been made in recent years, and sub-band encoding in particular has become the focus of much research for the following reasons:                (1) A practical image processing scheme having a high degree of compatibility and suited to the service layer of image communication is capable of being implement with ease.        (2) A visual spatial frequency characteristic modeled accurately in the sub-band region can be reflected directly in the control of encoding parameters.        
Described next will be the research trends and technical level relating to transformation and quantization techniques, which are the requisite techniques of symmetric image encoding.
With regard to transformation techniques, studies regarding prediction and orthogonal transformation have for the most part been completed, and investigations now proceeding deal mainly with band separating/synthesizing filters for sub-band encoding. The three capabilities required of a band separating/synthesizing filer are as follows:                (1) Perfect reproducibility: In a case where an image is reproduced without performing encoding, it is required that the original signal be reproduced faithfully without causing distortion that accompanies folding.        (2) Linear topological characteristic: It is required that the filter have a linear or near linear topological characteristic in image encoding.        (3) Orthogonality: There should be no overlapping of filters for band separation.        
At the present time, the only orthogonal filter of finite degree which truly satisfies the requirements of a linear topological characteristic and perfect reproducibility is the Haar filter (which is equivalent to a second-order quadrature mirror filter).
With regard to quantization techniques, an encoding characteristic that is near ideal has attained an implementable technical level. In high-rate quantization of an unstored information source, combining linear quantization and entropy encoding of a quantized output makes it possible to achieve quantization at a quantization loss of approximately 0.25 bit/sample in comparison with the rate-distortion limit. In case of quantization at a lower rate, an excellent quantization performance can be realized by combining scalar quantization, which is equipped with a dead zone, and entropy encoding of a quantized output, wherein the entropy encoding includes zero-run encoding. Furthermore, at a low rate, utilization of vector quantization also is possible. If use is made of lattice vector quantization or a multi-dimensional quantization method in which a code book is furnished with a certain algebraic structure, as in the manner of a permutation code, multi-dimensional, high-efficiency quantization closer to the rate-distortion limit can be realized over a wide range of from low to high rates.
However, if an image occupies a major portion of the information handled, encoding which relies solely upon geometric information of the information handled is insufficient. Accordingly, attempts to carry out compression utilizing the structure of an image, namely the structure of the actual world, have been studied from several standpoints. These new attempts have already shown that there is no assurance that the transformation process and inverse-transformation process will be symmetrical. With such encoding, therefore, it is required that sufficient consideration be given to the inverse-transformation process. Such encoding is referred to as “asymmetric encoding”.
The foundation of the concept of a new type of image encoding (referred to as “asymmetric image encoding”), sometimes called “feature extraction encoding” or “structure extraction encoding” does not necessarily depend completely upon filtering theory alone, as is the case with conventional image encoding techniques. With asymmetric image encoding, symbolizing and reproduction algorithms are required to have the following properties:                (1) When an input image contains noise, it is required that fluctuation produced in a feature parameter by such noise be very small.        (2) A set of input images mapped onto identical feature parameters owing to the symbolizing algorithm should be as small as possible. If possible, the symbolizing algorithm should be one-to-one mapping to feature and structure parameters from the input image. The feature and structure parameter space and the symbolizing algorithm should be complete.        (3) The reproduction algorithm should be a stable, i.e., continuous mapping. More specifically, when noise due to coarse quantization appears in a feature parameter extracted by the symbolizing algorithm, and even when the calculation accuracy is unsatisfactory, it is required that reproduction of an image close to the original image be assured.        (4) It should be possible to calculate both the symbolizing algorithm and reproduction algorithm stably in numerical fashion and in a finite number of steps.        
Asymmetric image encoding presently being studied can be classified as follows:
Application of IFS (Integrated Function Systems)
The self-similarity (fractal property) believed to exist between partial images of an input image is utilized to find a reduced mapping proper to each partial image, and encoding is attempted on the basis thereof. At the present time, an affine transformation is employed as the mapping. Since a reproduced image is given as a set of stable points in a dynamic system, the image is decoded by recursive processing.
Application of Edge Information
It has long been known that edges provide important information in terms of visual information processing. A complete description of an image can be achieved by combining edge information and appropriate additional information. In an image encoding/decoding system that employs edge information, the position of an edge is extracted from an input image by an encoder, a certain type of information (e.g., a differential coefficient) at this edge position is extracted and the image is reproduced by appropriate interpolation performed by a decoder.
Use of Motion Information
A method of estimating a motion vector as a feature parameter and performing encoding using this vector was been studied in the field of moving-picture encoding. More specifically, a moving vector is predicted, three-dimensional motion and a three-dimensional structure (rule) are predicted on the basis of the predicted vector, and on the basis of these a three-dimensional structural (rule) model is constructed. The image is encoded/decoded using this three-dimensional structural (rule) model.
At the present time, it is assumed that when IFS is applied in image encoding, the fractal property appears to the same degree over the entirety of one image. However, if a natural image is taken as an example, a plurality of objects will be present in one image. Therefore, the assumption that there is a fractal property over the entirety of the natural image can only mean that the above method is fairly careless. Further, the employment of edge information or motion information involves many unsolved technical problems in terms of pattern recognition, learning of knowledge, etc.
As mentioned above, an edge is widely accepted as the feature of an image. Here an edge refers to a portion of the image where there is a large amount of change in image intensity (luminance, density, saturation and hue). Edge position is detected as the position of the maximal value of the first-order differential coefficient or as the position at which a second-order differential coefficient becomes zero. (See F. Ulupinar and G. Medioni: “Refining edges detected by a LOG operator”, Computer vision, graphic and image processing, vol. 51, pp. 275-298, 1990.)
A method of detecting a singularity by multi-resolution analysis has been proposed. (See S. Mallat and W. L. Hwang: “Singularity detection and processing with wavelets”, IEEE Trans. information theory, vol. 38, no. 2, pp. 617˜643, March 1992.) This method performs detection based upon the fact that a singularity exhibits a maximal value over all frequency components.
However, since a first-order differential coefficient has the effect of emphasizing noise, a method of detecting an edge from a first-order differential coefficient is susceptible to noise. Similarly, a method of detecting an edge from the zero-cross point of a second-order differential coefficient also is disadvantageous in that the precision of the position of the detected edge is influenced strongly by noise. Further, a method of detecting an edge by multi-resolution analysis is highly resistant to the effects of noise but involves a large amount of calculation.
Furthermore, an edge detected by these methods generally is continuous. If it is attempted to encode the original image based upon such an edge, it becomes necessary to execute some kind of threshold-value processing in order to select a finite number of feature points from a continuously existing edge.