1. Technical Field
The embodiments herein generally relate to content summarization, and more particularly to a system and method for summarizing content using weighted formal concept analysis (wFCA) based on user intent.
2. Description of the Related Art
Documents obtained via an electronic medium (i.e., Internet or on-line services, or any other services) are often provided in such volume that it is important to summarize them. It is desired to be able to quickly obtain a brief summary of a document rather than reading in its entirety. Typically, such document may span multiple paragraphs to several pages in length.
Summarization or abstraction is even more essential in the framework of emerging “push” technologies, where a user has hardly any control over what documents arrive at the desktop for his/her attention. Summarization is always a key feature in content extraction and there is currently no solution available that provides a summary that is comparable to that of a human. Conventionally, summarization of content is manually performed by users, which is time consuming and also expensive. Further, it is slow and also not scalable for a large number of documents.
Summarization involves representing whole content into a limited set of words without losing main crux of the content. Traditional summarization of content (in general a document) is based on lexical chaining, in which the longest chain is assumed to best represent the content, and first sentence of a summary is taken from first sentence of the longest chain. The second-longest chain is assumed to be the next best, and second sentence of the summary is then taken from first sentence of the second longest chain. However, this lexical chaining approach tends to not only miss out on important content related to intent of the user but also fails to elaborate it in a manner in which it can be easily understood. Accordingly, there remains need for a system to automatically analyze one or more documents and generate an accurate summary based on user intent.