1. Field of the Invention
The present invention relates to an image processing apparatus and an image processing method suitable for use in retrieving a predetermined object from a moving image.
2. Description of the Related Art
Conventionally, there is a technique which reproduces a moving image corresponding to an interval (period) in which a desired object is captured. In such a technique, the moving image can be reproduced by retrieving the predetermined object from a moving image, and setting the frame image containing the object as a representative image.
For example, Japanese Patent Application Laid-Open No. 2001-167110 discusses a technique which detects a shot (scene) change and then detects a face from the headmost frame of each shot. The technique sets the frame in which the face is detected as the representative image. The technique also identifies attributes of the face, such as orientation, size, and number of the faces, gender, race, and the name of the person, and the attributes can be designated as conditions of the representative image.
Further, Japanese Patent No. 3312105 discusses a technique which detects the face from the frame and then calculates an evaluation value using the size and the number of the detected faces and the distance from the center of the frame to the detected face. The frame with the greatest or the least evaluation value is thus set as the representative image.
However, according to the above-described conventional techniques, no distinction is made on whether the retrieved object is an object that the user intended to capture (hereinafter, such an object will be referred to as a key object as necessary). As a result, it is difficult to search for the interval (time period) in which the key object is captured.
More specifically, when a list of representative images configured of the frame images containing the object is provided to the user, the list includes both the key object and other objects. It is thus necessary for the user to first distinguish the key object from the other objects. Further, when generating a digest including only the intervals in which the object is present, the digest tends to include both the intervals in which the key object is present and the intervals in which the other objects are present.
The above-described technique discussed in Japanese Patent Application Laid-Open No. 2001-167110 can acquire the representative image matching the face attributes (i.e., size, number, gender, race, and name). However, such attributes are not related to whether the user purposely or accidentally captured the scene. Therefore, the face of an object that is not the key object may be selected as the representative image.
Further, in the technique discussed in Japanese Patent No. 3312105, the representative image containing the key object can be acquired if the object is one person captured at the center of the frame, for example, when taking a close up of the object. However, when a plurality of objects including the key object and the other objects is captured at the same time, the faces of both the key object and the other objects are contained in the representative image.
FIGS. 9A and 9B illustrate examples of motions of the objects within a frame display (FIG. 9A) and the actual motions of the objects (FIG. 9B).
Referring to FIGS. 9A and 9B, objects A and B are captured. However, the object A is purposely captured by the user, and the object B only happens to be in the frame.
In the example illustrated in FIG. 9A, the object A and the object B are proximately at an equal distance from the video camera, and the sizes of the faces of the object A and object B are proximately similar. Further, the object A is moving towards the left side in the drawing, and the object B is stationary or moving towards the right side in the drawing. Therefore, as time lapses, the object A goes out of the frame, so that the user pans the video camera to the left. As a result, the face of the object B passes through the center of a frame display 601 and moves towards the right side. Therefore, according to the technique discussed in Japanese Patent No. 3312105, the evaluation value of the face of the object B which is not the key object becomes high, so that the object which is not the key object is extracted as the representative image.