With the advent of the Internet and the World Wide Web, a vast amount of digital information is available over such networks. Information search and retrieval systems are utilized with respect to such networks to locate documents with largely redundant information in response to queries entered by a user. If the retrieved information is not a part of the data that is commonly shared, the user may be forced to examine a multitude of documents and wade through common material in a search of an uncommon fact. Further, if the information sought is available in multiple documents, then the user may not be able to select the optimal suite for presenting the material.
In an effort to address such problems, portions of the information that is shared by various members of the document set can be first determined. Such information can be utilized to present a document navigation aid that removes the redundant information so that the user may visit a topic once and then select the presentation of a topic based on document properties. Typical information redundancy systems can eliminate such redundant information from the document(s). Information redundancy systems can objectively measure duplication, locate duplicate content, eliminate extraneous content, and harmonize text variations within the document sets. Such information redundancy approaches can generally locate documents stored in an electronic media in response to the query entered by the user and provide multiple entry paths.
The majority of prior art approaches have adapted a visualization method that aids a user in navigating document sets. Such prior art approaches can provide an overview of the total information, the core information areas that are often repeated, and areas of specialized information unique to the document. Such prior art approaches, however, are typically applicable to identical pairs of paragraphs that appear to discuss the same topic across the document set, which leads to an inconsistent redundancy and is characterized by difficulties with respect to accuracy and evaluation.
Based on the foregoing, it is believed that a need exists for an improved method and system for constructing a document redundancy graph with respect to a document set. A need also exists for an improved method for eliminating redundant information and collapsing nodes to render the navigation of information more manageable, as described in greater detail herein.