One example of this type of a conventional video generation device is described in JP-2005-157463-A. The configuration of the video generation device described in this official publication is shown in FIG. 1. This conventional video generation device edits contents data 904 received through ground digital broadcasting to generate summary data thereof. Contents data 904 is a collection of frames, where each frame includes additional information indicative of persons, places and the like which appear in the frame, in addition to video data and audio data. Contents data capture means 900, upon capturing contents data 904, outputs the additional information of each frame to appearance ratio calculation means 901, and also outputs the contents of each frame to summary data creation means 903. Appearance ratio calculation means 901 recognizes persons and places which appear in each frame based on the additional information to calculate an appearance ratio in which each appearing person and place appear in an arbitrary frame. Main character and main stage decision means 902 selects an appearing person and place which have the appearance ratio equal to or higher than a threshold as a potential hero and a potential center position. Summary data creation means 903 selects frames in which both the potential hero and potential center position, selected by main character and main stage decision means 902, appear in the same frame, with reference to the additional information of each frame sent thereto from contents data capture means 900, and rearranges the selected frames in time series, thereby creating summary data which is a story-like summary of contents data 904. However, there is no clear description as to how many frames, in which each potential hero and each potential center position appear, should be selected to create the summary data.