In recent years, a face-detection function has been incorporated in an electronic camera as a means for alignment in functions of auto focus (AF), auto exposure (AE), and backlight correction. Use of such a face-detection function which allows automatic focusing on an object effectively supports a user in image capturing.
As an image processing device and a technique for face detection, proposed is a method to learn patterns of a face and a non-face object (hereinafter, referred to as non-face) and to distinguish between the face and the non-face using a distinguishing device which holds parameters thus learned (See Patent Reference 1).
FIG. 1 shows an image processing device as disclosed in Patent Reference 1. FIG. 2 is a diagram showing an example of clipping a partial image. A partial image clipping unit 900 clips a partial image 1000 from an input image 1001. The partial image 1000 is clipped by sequentially scanning windows in plural sizes, starting from the left top toward the right bottom of the image and gradually shifting to the right or to the bottom by an appropriate amount of pixels (for example, by one pixel) (FIG. 2). Note that the “clipping” is to read image data of a corresponding portion.
A feature amount evaluation unit 1(901) includes a combination of distinguishing devices. The distinguishing devices each calculate a feature amount at a specific position using a parameter learned by the boosting method, based on a rectangular feature to be described later (hereinafter, referred to as an adjacent difference filter). Then, the feature amount evaluation unit 1(901) distinguishes the partial image as non-face when a weighted linear sum of output values from the distinguishing devices is below a threshold that is calculated based on the learning, and terminates the process of discriminating the partial image. On the other hands, when the weighted linear sum is equal to or above the threshold, the feature amount evaluation unit 1(901) distinguishes the partial image as a face, and a feature amount evaluation unit 2(902) performs subsequent processing. The feature amount evaluation unit 2(902) performs evaluation using a parameter different from the learned parameter used by the feature amount evaluation unit 1(901). Thus, the evaluation value is calculated using plural feature amount evaluation units, so as to distinguish between the face and the non-face based on the calculated evaluation value.
FIGS. 3(a), (b), (c), and (d) illustrate the adjacent difference filter, and FIG. 3(e) is an example of applying the adjacent difference filter (b) to an image. The adjacent difference filter is indicated by a white rectangle and a black rectangle which are adjacent to each other, and outputs a difference between an average pixel value in the white rectangle and an average pixel value in the black rectangle. A feature of a facial part can be identified when there is a significant difference between the average pixel values output from the adjacent difference filter, which means an output of high-level feature amount in an area having a significant difference in pixel values between adjacent areas such as eyes or a mouth. For example, FIG. 3(e) calculates a feature amount based on the difference between a sum of pixel values in the black rectangle at a forehead position and a sum of pixel values in the white rectangle at an eyebrow position. Such a feature amount, which indicates a difference between pixel values of adjacent areas, strongly responds to a local feature within the image (for example, a line component), and allows an output of characteristic values in such facial parts as the eyes, eyebrows, and the mouth. Note that the adjacent difference filter is generally referred to as a Haar-like feature.
However, it is not possible to detect the face simply by such face detection within an area where the face (eye, nose, or mouth) is not visible, nor is it possible to track an object other than a face such as a pet. Thus, there is a method of automatically focusing the object other than a face by tracking the object based on information on the object previously registered by a user.
An object tracking method that has conventionally been used includes: a face tracking method by re-detecting the face only in the neighborhood area of a position at which a result of the face detection is obtained; template matching using a neighborhood search of a previous frame based on a correlation operation; a method based on an active search aimed at speeding up; and a method based on a particle filter or condensation which performs a search considering a motion prediction using statistical information based on probability distribution.
In these methods, an initial feature amount (a color histogram of colors or luminance, or a template image itself, shape, contour information, and so on) of the object intended to be tracked is previously registered using some technique. The object is tracked by searching, in the picture, for a position at which the image having a feature similar to the feature indicated by the registered feature amount, using the registered feature amount. In these methods, the initial feature amount is previously prepared, and matching is performed between this initial feature amount and the feature amount extracted at each position in the image.
However, the face of an object to be captured using general movie is not often visible for a long time but often significantly changes appearance on the image. The conventional method has a problem of easily losing track of the object when the object significantly changes appearance on the image.
To solve this problem, Patent Reference 2, for example, uses a method of sequentially updating a template. According to this method, even when the object to be tracked changes appearance, the template is updated according to the change. This allows tracking of the object changing appearance.
Patent Reference 1: US Patent Application Publication No. 2002/0102024.
Patent Reference 2: Japanese Unexamined Patent Application Publication No. 2002-157599.