The present invention relates to an image processing device, an image processing method and a recording medium for extracting a frame image corresponding to a best shot from a moving image and outputting the frame image.
In recent years, many moving images are shot even by common families. A shot moving image may include a best shot showing a scene that is hard to seize by a device for taking still images (such as a shot properly showing a motion of a person shot for the moving image), for instance, the scene of a child blowing out candles on his/her birthday. On the other hand, moving images may include scenes with less movement of persons, low importance, poor composition, poor image quality, or other unfavorable properties.
It is therefore extremely troublesome to find a best shot in a moving image and extract a frame image corresponding to the best shot from the moving image.
Meanwhile, there has been used a school photography service that sends a photographer to a school to take still images of school events such as a sports day. In this service, shot still images are uploaded to a network so that parents can order photographic prints of the still images via the network. In this service, by, for example, in addition to taking still images, shooting moving images with fixed cameras or the like and extracting frame images corresponding to best shots from the moving images, it is possible to order photographic prints of the frame images as with still images.
In this case, there is a demand for extracting frame images corresponding to best shots from moving images with taking account of information on still images, for example, preferentially extracting frame images that show scenes still images do not show, from moving images.
Aside from that, when, for instance, only two brothers are shot as subjects for a moving image of a family, it is only necessary to track the two brothers in the shot moving image. However, in a moving image of a sports day, several tens of persons appear one after another. In this case, simply tracking all the persons in the moving image results in longer processing time, and tracking unrelated persons results in lower accuracy of extraction of frame images. To cope with it, there is a demand for properly extracting a frame image corresponding to a best shot even from a moving image showing many persons.
Aside from that, when a moving image is shot by a fixed camera or the like, the shooting period of time is to be long and this is disadvantageous in terms of processing time. To cope with it, there is a demand for extracting only necessary frame images from a moving image as efficiently and quickly as possible.
Now JP 2008-294513 A, JP 2010-93405 A and JP 2012-44646 A are mentioned as literatures related to the present invention.
JP 2008-294513 A relates to a video playback device for playing back a highlight scene of a moving image content. The literature describes that time information on a still image content which was taken simultaneously with shooting a moving image content is used in detecting a highlight scene of the moving image content.
JP 2010-93405 A relates to an information processor and the like for playing back a highlight scene of a moving image content. The literature describes that, based on the number of still image shooting devices that shot a highlight scene detection desired portion in a highlight scene of a moving image content, the importance of the highlight scene is controlled.
JP 2012-44646 A relates to an image processing device and the like for making a layout of a plurality of images. The literature describes extracting frame images from a moving image in accordance with the relationship between the moving image and still images and determining a layout of the still images and the frame images.