As a conventional method for extracting image frames (called “representative images” or “key frames”) that represent a given moving image from that moving image, a method of calculating differences between neighboring image frames, determining division points on the basis of the degrees of change (differences), and selecting a predetermined frame (e.g., first, last, or middle frame) of each division as a key frame is available. Such key frames are generally used to edit, manage, search, and categorize moving images.
Originally, key frames are preferably set at appropriately distributed positions in entire moving image data. The aforementioned method is effectively applied to already recorded moving image data. However, when the above method is applied to image data which is being sensed by an image sensing apparatus, key frames often concentrate on a specific portion. For example, when the user slowly pans a camera, the degree of change in image is small (i.e., the inter-frame differences are small). Hence, an appropriate division point cannot be found from that degree of change, and it is difficult to obtain an appropriate key frame from such scene.
On the other hand, when the user pans a camera quickly, the inter-frame differences during panning depend on the monotony of a scene to be sensed. If a scene to be sensed is monotonous, a key frame cannot be obtained since the inter-frame differences are small. However, if a scene to be sensed is not monotonous, key frames locally concentrate since larger inter-frame differences are obtained.
As described above, key frames are preferably set at appropriately distributed positions in entire moving image data. For this purpose, it is required to divide moving image data at appropriate positions so as to set key frames.