1. Technical Field
The present invention is generally directed to an improved computing system. More specifically, the present invention is directed to a method and apparatus for determining the relative relevance between portions of large electronic documents.
2. Description of Related Art
With the present information age, access to literature has become increasingly easy to obtain. As literature is moved from a physical format to an electronic format, more people are being able to gain access to the information contained in this literature through the use of computers, networks, the Internet, and the like.
Being able to compare literature, e.g., books, articles, magazines, etc., and determine the relevance of one piece of literature to another, has been a valuable tool for identifying other pieces of literature that may be of interest to a reader. Traditionally, this was done in a manual manner such as through a manual cataloging scheme. Typically, these manual cataloging schemes use general topics, author names, title words, and the like, to determine which pieces of literature are most like one another and to categorize them in a similar category.
Manual comparisons are extremely time consuming when the number of documents, e.g., books, being compare are huge and usually are subject to personal biases. When a cataloging system is utilized, manual comparisons further require a detailed understanding of the cataloging system by the person performing the comparison of the documents so that the appropriate categories for the documents are selected.
In recent years, as literature has been moved from physical books, magazines and the like, to electronic documents, techniques have been devised to perform comparisons of electronic documents based on small standardized portions of the electronic document. For example, electronic documents typically will include an abstract and the comparison between documents is made based on this abstract.
Abstract-based comparisons are extremely unreliable as the entire electronic document, e.g., an electronic book, contains far more information than what is contained in the abstract. Thus, the book may have portions that are applicable to many different other types of books, yet the comparison of abstracts may not accurately reflect this fact. Furthermore, two electronic documents may have the same abstract, yet contain entirely different contents.
Thus, it would be desirable to have an automated system that performs a comprehensive comparison of an electronic document with other electronic documents to generate comparison results indicating the relative relevance of the documents to one another. Moreover, it would be beneficial to provide such a comprehensive comparison with on-line electronic documents as part of a search engine for finding additional electronic documents and provide a ranking of the relative relevance of the additional electronic documents.