1. Field of the Invention
An embodiment of the present invention relates to segmentation of video sequences, and more particularly, to an apparatus, medium, and method segmenting video sequences based on a topic at high speed by detecting main characters.
2. Description of the Related Art
Developments in digital signal processing techniques such as video and audio compression have allowed users to retrieve and browse desired multimedia content at desired points in time. Fundamental techniques required to browse and retrieve non-linear multimedia content include shot segmentation and shot clustering, with these two techniques being most important for structurally and hierarchically analyzing multimedia content.
A “shot” in a video program is a sequence of frames that can be obtained from a video camera without interruption, and may functions as a basic unit for analyzing or organizing the video program. The shot may mean a single frame or a plurality of frames, however, for simplicity of explanation, the term shot will be exemplified by the single frame, noting that embodiments of the invention are not limited to the same. In addition, a “scene” in the video program is a semantic element of a video construction or development of a story, and includes a collection of shots related to one another by the same semantic context. The concept of the shot or the scene may be similarly applied to an audio program as well as the video program.
A multimedia indexing technique allows users to easily browse or retrieve a desired part of the video program. A conventional multimedia indexing technique may include extracting organizational information of video content in units of shots or scenes, extracting main characteristic elements such as key-frames capable of representing a corresponding segment for each organizational unit, indexing the organizational information for multimedia content, and describing semantic information, such as an occurrence of an event, advent of visual or auditory objects, and conditions and backgrounds of objects, along a temporal axis.
However, such conventional multimedia content indexing techniques fail to easily identify the result of a summarization because excessive segments are generated when segmentation is performed on the basis of scene change. In addition, conventional techniques fail to accurately detect start points of the segments because the multimedia content is not segmented on the basis of similarity of content, but rather, the multimedia content is summarized using a single piece of information such as similarity of colors. Further, it is difficult to summarize the multimedia content when a broadcast type or genre is changed because only a characteristic of a particular genre is used. Moreover, due to an excessive processing load generated during the summarization of the multimedia content, it is difficult to apply conventional techniques to embedded systems such as mobile phones, personal digital assistants (PDAs), and digital cameras, which have low performance processors.