1. Field of the Invention
The present invention relates to a data summarization method and apparatus, and more particularly, to a method and apparatus for generating a suitable abstract by analyzing a morpheme and grammatical structure of a caption.
2. Description of the Related Art
Due to development of data compression and data transmission technologies, an increasing amount of multimedia data is being generated and transmitted. Due to a large number of multimedia data capable of being accessed on Internet, it is very difficult to retrieve desired multimedia data. Also, many users want to receive only important information in a short time via an abstract of data, which is made by summarizing multimedia data. In response to requests of users, there are provided various methods of generating an abstract of multimedia data. From the methods of generating an abstract of multimedia data, there is a method of generating an abstract by extracting a noun from closed caption text. However, the abstract generated by extracting a noun is too long and is not refined to be provided to users. Also, since there is no context because of extracting only a noun, a meaning of the abstract cannot be precisely conveyed. For example, when extracting only a noun from closed caption text such as “It was confirmed the artificial fish reef that was installed in order to protect marine resources cannot do its job”, awkward abstract content is extracted such as “fish reef, resources, and job”.
Also, in a conventional method of generating an abstract by recognizing a caption added to video data, since a caption has to be directly recognized and processed from video data, an amount of data that has to be processed to generate the abstract is increased. Also, since a caption included in the video data is generally made to be condensed in order to convey content together with the video data, a text of the caption may not precisely reflect the content.
Accordingly, a data summarization method and apparatus capable of generating a natural abstract by extracting an abstract word by reflecting a morpheme of a text included in a closed caption, grammatical structure, a meaning of a word, and rearranging the extracted abstract words in a form suitable for recognition are required.