1. Field of the Invention
This invention pertains to evaluating the relative importance of entities mentioned by data sources such as books.
2. Description of the Related Art
Modern information storage and retrieval systems contain a great deal of information. For example, many books, magazines, and other documents have been converted into computer-readable formats and stored in such systems. With the advent of search engines and related tools, it has become possible to structure and extract information from the documents. For example, an automated tool can analyze the text of a book to identify entities such as locations, dates, and people mentioned within it. The results of these analyses are presented to users to assist them in comprehending the information contained in the book.
It is generally acknowledged that certain types of information are easier to comprehend when presented in certain formats. For example, geographic locations are often easier to understand when identified by markers on a map. Similarly, dates are often more understandable when presented on a linear graph such as timeline. Other types of information, such as facts about a person, are often best understood when presented as text or in tabular form.
There are frequently limits on the amount of information that can be simultaneously displayed in a given format. A map might become incomprehensible if it includes so many markers that the map itself is obscured. Likewise, a timeline showing too many dates appears cluttered. It is usually desirable, therefore, to limit the amount of information presented at once in order to maintain its comprehensibility and improve its utility.
One way to limit the information is via rankings. This technique is employed, for example, by search engines that return results matching a search query. Although there might be millions of results matching the query, the search engine ranks the results and allows the searcher to traverse the results in the ranked order. An effective ranking also increases the likelihood that the searcher will quickly find the sought information. Similarly, an effective ranking increases the usefulness of information presented during general browsing (i.e., when a user is not searching for specific information).
A major difficulty is determining how to rank the information, especially in a query-independent manner. Some documents do not provide explicit guidance on how to rank the information they contain. A book, for example, might mention 500 different geographic locations, but contain no explicit guidance on how to rank these locations. As a result, it is hard to determine which information to present on a given display. Accordingly, there is a need for a way to determine the relative importance of information contained within books and other documents.