In information search and retrieval, a large number of results are often returned in response to a query. Even if all the documents are relevant to the query, the user still faces a problem of finding what is the most relevant or needed information from a large number of candidates. Sometimes, a user may only need a brief overview of the queried topic, while other times the user may want to look for detailed information about the queried topic. Sometimes a user may only need information about the general aspects of the queried topic, while at other times the user may want to look for information about certain specific aspects about the queried topic. The task of sifting through a long list of candidate documents can be time-consuming as well as frustrating.
For example, if the query is about cars, a search engine may return a huge amount of documents that contain information about cars. There can be documents that provide general information about what cars are and/or how they work, etc. There can also be documents about how to buy or sell a car, etc. Some document may be about specific car makes and models, some documents may be about specific properties of cars. Some documents may be a brief introduction about cars, while others may be detailed descriptions about many aspects of cars. For the same topic related to cars, such as the general information about the engines of cars, some documents may provide a brief overview; while others may go into great details and length.
Conventional search engines, whether they are used for the Internet or for an enterprise, rank returned documents mainly based on keywords or hypertext links. Even though the returned documents may be relevant to the query, there is no way to detect or distinguish the differences between the characteristics of the contents in the returned documents, such as the differences about cars as described above.