Recently, digital image information that a user can use in various fields has been steadily growing. In conjunction with development of the Internet society, computer equipment, communication environment and/or interface has been speeded up in a broader area, and various image data has been accumulated ubique in large quantity, which gives more importance to image summarizing technology that makes it possible to access flood of information and to watch only a part that a user wants to watch or a highlighted part in a short period of time. This kind of the image summarizing technology can be classified into the following two.
The first technology extracts a low level characteristic amount that is obtained from various media such as a color or texture, a camera work, a face of a person, a caption, nature or magnitude of sound, TFIDF about written data and so on, from an image by means of an automatic analysis, and specifies an important scene by means of these combination or time change, and especially this technology is frequently applied to an image having a structure of a relatively comprehensible context such as news image.
The second technology designs an index in association with the contents to be easily handled by allowing data necessary for summarizing contents to be input by hand to a large extent, and produces and makes use of it, then if once input of the data is completed, the index that expresses a flow of a context or each characteristic scene can be utilized, which makes it possible to summarize the contents that is flexible and that meets a purpose.
However, since the first technology is fundamentally based on a low-level characteristic, it is difficult to specify semantic concrete contents such that what each scene expresses. In addition, in accordance with the second technology, a range to which this technology is applied is limited and this technology requires some artifice to alleviate cost or burden due to manpower.