1. Field of the Invention
The present invention relates to a main face choosing device, a method for controlling the choosing device, and an image capturing apparatus.
2. Description of the Related Art
There has been heretofore produced image capturing apparatuses (e.g., digital still cameras and digital video cameras) with an auto-focus (AF) function with which a subject is automatically brought into focus and an auto-exposure (AE) function with which exposure is automatically performed. In order to more precisely perform AF, AE, and so on (hereinafter collectively referred to as “AF and so on”) suited to the face of a human figure, some apparatuses have the function of detecting the face of a human figure in a captured image. Further, in cases where the faces of a plurality of human figures have been detected in a captured image, some of the apparatuses have the function of choosing the face of the human figure determined to be a main subject (hereinafter referred to as the “main face”) from among the faces of the human figures.
In the above image capturing apparatuses, the choice of the main face has been made based on the states of the faces of the human figures in the captured image. The wording “the states of the faces of the human figures” used herein refers to parameters representing, for example, the positions of the faces in the captured image and the sizes of the faces.
However, where only the states of the human figures at a particular time have been taken into account, main face changeover occurs frequently due to a slight change in their states. Because of this, techniques have been proposed in which when choosing main faces in captured images, extremely frequent main face changeover is suppressed while giving much consideration to the states of the faces of human figures in captured images (see Japanese Patent Laid-Open Nos. 2008-005438 and 2008-205650).
The specific logic of the above techniques will be described below. FIG. 4 shows an example of information on the face of a human figure in a captured image. When a plurality of faces have been detected, face information is sought for each face.
Such face information includes the distance from the coordinates of the center of the captured image (center_x, center_y) to the coordinates of the center of the detected face (Face 1) (x1, y1). The information also includes a face size (size 1) representing the length of each side of the face assumed to be of a square shape and a reliability value representing a probability that the detected face will be the face of a human figure. These items of face information are obtained from the captured image by using a known face detection technique.
For example, from a state in which the majority of the face is flesh-colored, the face size can be set such that the ratio of the flesh-colored area in the predetermined-size square stands at a predetermined value. And further, from a state in which the pupils of eyes are black-colored, the reliability is determined based on whether there are two eyes or not, the distance between the two eyes, whether or not there is a nose between the two eyes, and so on. In this case, it is assumed that the reliability is represented on a scale from 1 to 10, and a 1 indicates the highest probability of the face being of a human figure.
Firstly, a first weight is calculated through the use of the reliability of the detected face and a reliability-weight characteristic graph as shown in FIG. 5. In FIG. 5, the x-axis indicates the input, that is, the reliability of the detected face, and the y-axis indicates the output, that is, the first weight. The first weight is set to 1.0 when the reliability stands between 1 to 3 inclusive; the point indicating the reliability 3 and the first weight of 1.0 and the point indicating the reliability 5 and the first weight of 0 are connected to each other with a straight line; and the first weight is set at 0 when the reliability stands at 5 or higher.
Next, a second weight is calculated through the use of the size of the detected face and a face size-weight characteristic graph in FIG. 6A. In FIG. 6A, the x-axis indicates the input, that is, the size of the detected face, and the y-axis indicates the output, that is, the second weight. For example, the point indicating a face size of 0 pixels and the second weight of 0 and the point indicating a face size of 20 pixels and the second weight of 0.2 are connected to each other with a straight line. And further, the point indicating a face size of 20 pixels and the second weight of 0.2 and the point indicating a face size of 30 pixels and the second weight of 1 are connected to each other with a straight line, and the second weight is set to 1.0 when a face size is 30 pixels or more.
In FIG. 6A, when the size W1 of a face F1 is not more than 20 pixels, the second weight is a maximum of 0.2. However, when the size W2 of a face F2 is in the range of 20 to 30 pixels, the second weight varies from 0.2 to 1.0. That is, when the face size exceeds 20 pixels representing a face size worthy of being determined as a main face, the value of the second weight changes abruptly; therefore, the face F2 worthy of being the main face is given a higher weight value.
Moreover, as shown in FIG. 6B, distance information dist in the form of coordinate values on the distances between the center O of a captured image and the centers of detected faces F1 and F2 is extracted. Then third weights are calculated by using the extracted information dist and a distance-weight characteristic graph in FIG. 6B. For example, it is assumed that the size of the captured image in which face detection is to be done is 320×240 pixels, and the weight is set at 1.0 when the distance from the center O is 10 pixels or fewer. And further, the point indicating a distance of 10 pixels and a third weight of 1.0 and the point indicating a distance of 80 pixels and a third weight of 0 are connected to each other with a straight line, and the weight is set at 0 when the distance is 80 pixels or more.
In that case, when the distance from the center of the face F1 is of the order of 10 pixels, the weight is 1.0. However, when the distance from the center of the face F2 is not fewer than 10 pixels, the weight is below 1.0; when the distance from the center of the face F2 is 80 pixels or more, the weight is 0.
That is, the face F1, which is close to the image's center and worthy of being determined as a main face, is given a large weight value.
As a result of multiplying the first to third weights, the face with the largest weight value can be determined to be the most likely main face in the frame.
However, in cases where there is not a large difference in the composition of human figures between frames, it is expected that when the positions of the human figures and the reliability of the human figures' faces have changed slightly, a changeover to a face determined to be the most likely main face is performed for each frame. In such a case, since such main face changeovers happen frequently, images become unfavorably unsightly. Because of this, it is considered that once a main face has been chosen, there is a need not to readily effect a main face changeover.
Specifically, the coordinates of a face chosen as a main face at the time of the last face detection are read out, and then the distance between the coordinates read out and the coordinates of each newly detected face is determined by using, for example, Pythagorean theorem. Thereafter, the face nearest the last chosen main face is assigned a fourth weight of 1.4, and the other faces a fourth weight of 1.0.
Then the final weight of each face is calculated by multiplying the fourth weight and the product of the first to third weights, following which the face with the greatest final weight is determined (chosen) to be a main face at that time. Therefore, even in the case where faces other than the main face are higher than the main face in the weighting product based on their reliability, size, and position, the possibility that they are chosen as a newly determined main face is small when they are away from the face chosen as the main face at the time of the last face detection. Thus limitations are placed on main face changeover and, hence, frequent main face changeovers can be suppressed.
However, even when the above method is used, the following problems may arise. FIGS. 7A and 7B illustrate main face determination operations performed when a face A is detected after the detection of a main face B. In these operations, a final weight value is determined by multiplying (or adding) three weight values found from the position and size of each face and the distance between the last main face and each face, and then the next main face is determined based on the product (or sum).
As shown in FIG. 7A, when the face A of a passerby has been included in the captured image at a position between the image capturing apparatus and the main face B as a main subject for example, the face A image becomes much larger than the face B image. Therefore, even if the fourth weight described above is taken into consideration, the face A will be assigned a larger final weight value than the face B, and then a main face changeover from the face B to the face A will be immediately effected at such a time. In that case, the weight value assigned to the face A is as follows:60%×100%×1.0(the fourth weight)/100%=60%Likewise, the weight value assigned to the face B is as follows:100%×40%×1.4(the fourth weight)/100%=56%
Moreover, as shown in FIG. 7B, in the case where a main face B was recognized before the detection of a face A which is the main subject the user wanted, the final weight value assigned to the face A does not exceed that assigned to the face B at times due to small differences in their sizes and their positions in the frame. In this case, the main face B has sometimes remained chosen as the main subject; therefore, the face A the user wanted has not been chosen as a main face at times. Incidentally, the weight value assigned to the face A is as follows:100%×40%×1.0(the fourth weight)/100%=40%Likewise, the weight value assigned to the face B is as follows:90%×40%×1.4(the fourth weight)/100%=50.4≈50%
As such, the user can satisfy himself/herself that the faces of human figures present at the centers of successively captured images are each chosen to be a main face. However, the user might not desire frequent main face changeovers be done with faces having a weight value heightened abruptly such as faces which momentarily pass across the field of view. Also, the user might not desire that since the position of a main face, not in the center of the captured images, does not change in different frames, main face changeover does not occur. Such being the case, it is thought that the related art can be still further improved.