With the advance of the information technology in addition to the strong development of storage devices and the technologies of network transmission and data compression, the usage of multimedia data has been rapidly increased, such as photos, video clips, music, etc. For processing the daily increased multimedia data and the diversities thereof, the technique for searching the contents of the multimedia data is always quite an important topic, wherein the searching technique with respect to the video chips is quite important thereto.
One of the most important factors in the searching technique for video chips is related to how to generate searching indexes (annotations) and use the annotations of the video chips to enhance the searching efficacy and efficiency. The video annotation method is roughly divided into two categories: one is to perform annotation on video data by using low-level features of the video chip itself; and the other is to perform annotation on video data from the human viewpoint. Although performing annotation on a video clip via the human viewpoint can humanize the searching technique, yet it takes a large amount of manpower and time due to immense video data, so that there is a need to develop a technique for automatically performing annotation on multimedia data.
A conventional skill uses the relationships between low-level features of an image and high-level semantic concepts to perform annotation on the image. However, the contents of a video clip are too complex, thus resulting in large gaps and misinterpretation between low-level features of an image and high-level semantic concepts. Another conventional skill uses association rules combined with a fuzzy theory to perform annotation on a video clip, thereby distinguish if the video clip is a report from an anchorperson or that from an outdoor scene; or finds the special events and frequently occurring patterns via the continuous relevance among video clips. These conventional skills are used to extract an abstract of a video clip, or to perform annotation on the special events of some video clips, but all have to adopt knowledge of experts or professional fields, and can only be applied to the video clips of a specific type, thus lacking generic applicability.