1. Field of the Invention
The present invention relates to picture matching processing for matching a picture containing a recognition target inputted through a picture input apparatus such as a camera with a previously registered picture, thereby identifying a target in the input picture. A picture matching processing system of the present invention is applicable to recognition processing for any two-dimensional or three-dimensional object such as a person and a consumer product.
2. Description of the Related Art
While an application requiring picture matching processing is being spread, there is an increased demand for a technique of searching and cutting out a three-dimensional object such as a person and an article appearing in a picture captured through a picture input apparatus such as a camera, followed by recognition and matching. Among techniques of recognizing a recognition target in a captured picture, several excellent techniques are known. One of them is a picture matching processing technique using an Eigen-Window method. Another is a picture matching processing technique using an improved Eigen-Window method, in which a feature value of a picture is converted to a discrete cosine transform (DCT) coefficient.
Hereinafter, the picture matching processing technique using an improved Eigen-Window method will be described as a conventional picture matching processing technique. The case will be described in which a person's face picture is recognized and matched.
The picture matching processing using an improved Eigen-Window method consists of a “registration phase” for creating a model used for picture matching, and a “recognition phase” for conducting recognition and matching processing with respect to a recognition target of an input picture.
First, a procedure of the “registration phase” will be described with reference to a flow chart in FIG. 13. In the registration phase, a two-dimensional or three-dimensional object to be recognized and matched (i.e., a basic posture picture (front picture, etc.) of a person's face to be recognized) is generated and registered as a model for matching.
(1) A person's face picture to be a model picture is obtained (Operation 1301). Captured picture data of a front face picture may be inputted from outside in a file format. In the case where there is no appropriate data, a person's front face picture to be registered is captured through a picture input apparatus such as a camera. Herein, as an example, it is assumed that a model picture shown in FIG. 15A is captured.
(2) Feature points are detected from the captured model picture (Operation 1302). The feature points are detected and selected by using some index. For example, there is a method of selecting a point at which a texture degree (index regarding the complexity of texture that is a surface pattern of a picture) is equal to or larger than a threshold value, a point at which an edge intensity (index regarding an edge component) is equal to or larger than a threshold value, and a specific point in a picture such as a point at which color information is in a predetermined range. There is also a method of utilizing the knowledge regarding a recognition target in a captured picture and selecting important portions thereof (feature portions such as eyes and a mouth). In FIG. 16A, points assigned to a face picture (i.e., a model picture) schematically represent feature points.
(3) A small region surrounding the selected feature points (e.g., a rectangular local region) is selected as a window picture (Operation 1303). Every local region on the periphery of every feature point is selected as a window picture, respectively. For example, this window picture may be a small square of 15 pixels×15 pixels.
(4) The selected window picture is compressed to a lower-order dimensional space to a degree that it is still effective for recognition, and organized and stored as a model for each model picture (Operation 1304). As a method for compression to a lower-order dimensional space, an Eigen-Window method may be used. However, herein, an improved Eigen-Window method is used. The Eigen-Window method calculates a DCT coefficient from window picture data, and appropriately selects a coefficient of a low frequency excluding a DC component, thereby compressing the window picture to a lower-order dimensional space. According to the improved Eigen-Window method, a compression method using a DCT is used. For example, the window picture that is an original picture is composed of 15 pixels×15 pixels (i.e., the window picture is 225-dimensional); in this case, a DCT coefficient is calculated, and 20 coefficients of a low frequency satisfactorily representing picture features excluding a DC component are selected so as to compress the window picture to a 20-dimensional picture. FIG. 16A schematically shows a state where the window picture is projected onto the lower-order dimensional space.
(5) The compression to a lower-order dimensional space in Operation 1304 is applied to all the window pictures, and the data thus obtained is registered and managed as model data for picture matching (Operation 1305).
By the above-mentioned processing in the registration phase, a model for matching of a lower-order dimensional picture is generated from a person's face picture and registered.
Next, the procedure of the “recognition phase” processing will be described with reference to a flow chart in FIG. 14.
(1) A person's face picture is captured in which a person's face picture to be a recognition target appears (Operation 1401). Captured picture data of a front face picture may be inputted from outside in a file format. In the case where there is no appropriate data, a person's front face picture to be registered is captured through a picture input apparatus such as a camera. The latter case is often used for an entering/leaving management system. Herein, it is assumed that a picture to be a recognition target shown in FIG. 15B is captured.
(2) A person's face picture to be a recognition picture is cut out from the input picture (Operation 1402). In this case, the position of a person's face picture region to be a recognition target may be estimated, or a predetermined rectangular region may be cut out. As a method for estimating the position of a person's face picture region, it is known to estimate a face picture region by detecting a skin region.
(3) Feature points are detected from the cut out recognition target face picture (Operation 1403). The feature points may be selected by using the same index as that in the registration phase. Alternatively, the feature points may be selected by using another appropriate index.
(4) A local region is selected as a window picture, based on the selected feature points (Operation 1404). In the same way as in the registration phase, for example, a window picture of 15 pixels×15 pixels is selected. FIG. 16B schematically shows this state.
(5) The selected window picture is compressed to the same lower-order dimensional space as that in the registration phase (Operation 1405). Herein, in the same way as in the registration phase, a method for compressing a window picture to a lower-order dimensional space, using an improved Eigen-Window method is used. More specifically, a DCT coefficient is calculated from a window picture that is an original picture, and 20 coefficients of a low frequency effectively representing picture features excluding a DC component are selected, whereby the window picture is compressed to a 20-dimensional space.
(6) Recognition target data is projected onto the above-mentioned feature space (that is a lower-order dimensional space) for each window picture (Operation 1406). FIG. 16B schematically shows this state.
(7) A pair of registered window picture and recognition target window picture, of which distance is small in the feature space, is found, whereby the window pictures are matched with each other (Operation 1407).
(8) Relative positions are obtained in a pair of window pictures, and voting is conducted with respect to a corresponding grid on a voting map (Operation 1408). FIG. 17 schematically shows this state. Herein, the voting map refers to a voting space obtained by partitioning a plane prepared for each model picture into a grid shape, and voting refers to processing of adding an evaluation value to a grid on a voting map in accordance with voting. The grid position to be voted is determined in accordance with a relative position in a pair of window pictures in the feature space. For example, if both of them are at the same position, a relative position becomes 0, which is voted to the center of the voting map. If a face picture of a registered model and a face picture of a recognition target are of the same person, many window pictures such as an eye and a mouth exactly correspond to each other; therefore, the relative positions of the window pictures corresponding to each other become almost constant, and votes will be concentrated on the same grid position on the voting map. On the other hand, if a face picture of a registered model is different from a face picture of a recognition target, the number of window pictures that do not correspond to each other increases, and the relative positions of these window pictures are varied. Therefore, votes will be dispersed in a wide range on the voting map.
(9) A grid having the largest number of votes (hereinafter, referred to as a “peak”) is found, the similarity between a face picture of a registered model and a face picture of a recognition target is calculated based on the number of votes obtained, and picture recognition and matching are conducted based on the calculation results (Operation 1409). Furthermore, it can be detected from the position of the peak where a registered object is positioned in the recognition target picture.
According to the above-mentioned picture matching method, it can be recognized whether or not an object in an input picture is identical with an object of a registered model previously prepared.
The picture matching processing technique using an improved Eigen-Window method has many excellent aspects, and the widespread use of a picture matching processing system adopting this technique is expected. However, in spreading such a picture matching processing system using an improved Eigen-Window method, the following challenges should be addressed.
The first challenge is to ensure the robustness against the variations in an environment for capturing a face picture, such as light environment and a capturing direction of a person's face picture. More specifically, it is required that even when the environment for capturing a picture used for model picture registration is different from the environment for capturing a picture of a recognition target person, a high picture matching precision should be maintained. It is assumed that a picture matching processing system is used in various places, and it cannot be expected that the environment for capturing a picture is maintained at a constant level. For example, regarding a light environment, natural light (sunlight) is varied depending upon the time (morning, noon, and evening), and also depending upon the weather (fine, cloudy, and rainy). Furthermore, even in a room with less influence of outer light, the intensity and direction of artificial light may be varied. Furthermore, regarding a capturing environment such as a capturing direction and a capturing position of a subject, a person whose picture is to be captured does not always face a camera, and the distance between the subject and the camera is not necessarily constant. It is desirable that a person to be a subject is instructed to face a camera at a predetermined position; however, an application capable of conducting this operation is limited, which causes a lot of trouble on the user side in terms of convenience of use.
As one conventional technique of ensuring the robustness against the variations in an environment for capturing a face picture, the following picture matching processing is known: a picture is captured by photography for each capturing environment assumable with respect to each subject, registered in the registration phase; and a captured picture of a recognition target person is matched with a model picture prepared for each variation of a capturing environment for each model in the recognition phase. However, according to this method, the number of picture matching processing steps becomes large, which leads to an increase in a processing time, and increases the capacity of model data to be registered.
Furthermore, as another conventional technique of ensuring the robustness against the variations in an environment for capturing a face picture, picture matching processing is known, in which an environment such as light conditions, a capturing direction, and a capturing position is changed in capturing a face picture from a recognition target person in the recognition phase, whereby a number of pictures of variations in various capturing environments are captured for use in picture matching. For example, a recognition target person is instructed to slowly turn his/her face by 180°, so that a face picture is captured in various directions. However, according to this method, the number of steps for face picture capturing processing from a recognition target person in the recognition phase is increased, which results in an increase in a processing time in the recognition phase. Furthermore, it is required to ask cooperation regarding the position and direction of a camera for a recognition target person, which causes a number of problems in terms of user friendliness.
The second challenge is to reduce a picture matching processing time while maintaining a picture matching precision at a predetermined level or more. According to the picture matching processing using an improved Eigen-Window method, as described above, the correspondence between a window picture selected from a model picture and a window picture selected from a recognition target picture is obtained, and picture matching is conducted by evaluating the matching degree. As the number of selected window picture regions is increased, the number of steps of processing such as projection onto a projection space and evaluation of the matching degree of the projection results is increased. This might lead to an increase in a processing time. On the other hand, if the number of window picture regions to be selected is simply decreased, the number of processing steps is also simply decreased, resulting in a decrease in a processing time. However, in the case of simply decreasing the number of window picture regions, a picture matching precision may be degraded. Thus, according to the processing of simply decreasing the number of window regions, reduction of a processing time and maintenance of a picture matching precision have a trade-off relationship. Therefore, in the prior art, either one of the challenge of decreasing a processing time and the challenge of maintaining a picture matching precision is addressed while the other is sacrificed.
The third challenge is to reduce the volume of model data to be registered as a model. If the number of recognition persons dealt with by the picture matching processing system is increased, and variations in a capturing environment are increased, the volume of data required to be registered and maintained as model data is also increased. If the volume of model data is narrowed by simply reducing the variations in a capturing environment irrespective of the variations in a capturing environment, the volume of model data to be registered can be reduced; however, the robustness against the variations in a capturing environment cannot be ensured, and the system becomes weak with respect to the variations in a capturing environment, which might lead to a decrease in a picture matching processing precision.