1. Field of the Invention
The present invention relates to a subject tracking apparatus and a control method therefor, an image capturing apparatus, and a display apparatus.
2. Description of the Related Art
Image processing techniques in which a particular subject is detected from images supplied sequentially in a time series manner and the detected subject is then tracked are very useful, and are utilized, for example, for specifying a human face region in dynamic images. These image processing techniques can be used in a number of different fields, such as teleconferences, man-machine interfaces, security, monitoring systems for tracking human faces, and image compression.
Japanese Patent Laid-Open No. 2005-318554 discloses an image capturing apparatus for detecting the position of a face in an image and focusing on the detected face while capturing an image with an optimum exposure for the face. Japanese Patent Laid-Open No. 2001-60269 discloses an object tracking method and an object tracking apparatus in which a particular subject is tracked with the use of a template matching approach. Template matching refers to a technique of registering, as a reference image, a partial image obtained by clipping an image region as a target to be tracked, estimating a region in the image with the highest degree of correlation in terms of shape with the reference image, and tracking a particular subject.
FIG. 10 shows a flowchart of an example of subject tracking with template matching. Furthermore, FIG. 11 shows an example of subject tracking with template matching, which shows an example of tracking a person's face as a subject.
In FIG. 11, reference numeral 1101 denotes an input image in a frame t=0, reference numeral 1102 denotes a subject detection result for the input image in the frame t=0, and reference numeral 1103 denotes a reference image registered for the input image in the frame t=0. Furthermore, reference numeral 1104 denotes an input image in a frame t=1, reference numeral 1105 denotes a matching result for the input image in the frame t=1, and reference numeral 1106 denotes a reference image updated for the input image in the frame t=1. Furthermore, reference numeral 1107 denotes an input image in a frame t=2, reference numeral 1108 denotes a matching result for the input image in the frame t=2, and reference numeral 1109 denotes a reference image updated for the input image in the frame t=2.
As shown in FIGS. 10 and 11, the input image 1101 in a frame t=0 is loaded in an image apparatus (S1001). Next, subject detection processing is applied to the input image 1101 to extract a subject region which meets conditions for a shape as a human face, and the subject detection result 1102 is acquired (S1002).
Then, the image capturing apparatus registers the initial reference image 1103 from the subject detection result 1102 (S1003). Then, the image capturing apparatus loads the input image 1104 in the frame t=1. Then, the image capturing apparatus carries out matching processing for the input image 1104 with respect to the reference image 1103 registered for the input image 1101 in the frame t=0, in which the input image 1104 is subjected to clipping for each region to obtain a correlation value in terms of shape with respect to the reference image 1103 (S1005).
If the matching processing has not been completed for the matching area over the entire region of the input image (S1006: NO), the image capturing apparatus clips another region of the input image 1104 to carry out the matching processing continuously (S1005). If the matching processing has been completed (S1006: YES), the matching result 1105 is acquired in which a region with the highest degree of correlation is taken as the subject region in the frame t=1 (S1007).
Then, the image capturing apparatus updates the reference image 1106 on the basis of the subject region estimated in the matching result 1105 (S1008). Then, the image capturing apparatus loads the input image 1107 in the frame t=2 (S1004). Then, the image capturing apparatus carries out matching processing for the input image 1107 with respect to the reference image 1106 updated for the input image 1104 in the frame t=1 (S1005).
If the matching processing has not been completed for a predetermined matching area (S1006: NO), the image capturing apparatus carries out the matching processing continuously (S1005). If the matching processing has been completed (S1006: YES), the image capturing apparatus acquires the matching result 1108 in which a region with the highest degree of correlation is taken as the subject region in the frame t=2 (S1007).
Then, the image capturing apparatus updates the reference image 1109 on the basis of the subject region estimated in the matching result 1108 (S1008). As described above, the target subject is tracked by correlating continuously input images with the reference image obtained from the matching result in the previous frame.
However, wrong subject tracking may occur in the conventional tracking method, e.g., in a case in which a subject such as a background, which differs from the subject as a target to be tracked, is contained in the reference image for use in the matching processing, or in a case in which the orientation of the subject as a target to be tracked changes.
For example, in a case in which a subject which differs from the subject as a target to be tracked is contained in the reference image, a subject region affected by the differing subject will be obtained by matching processing. Furthermore, in a case in which the orientation of the subject as a target to be tracked changes, a shifted subject region may be obtained due to a change in the appearance of the subject, and is thus more likely to be affected by a differing subject. As a result, a subject region shifted from the subject as a target to be tracked will be extracted by subject tracking, and wrong subject tracking will thus occur. Then, since the shift of the subject region is not corrected, the subject tracking will be continued in the shifted state.