This invention relates to the field of data mining, specifically the exposition of the relationships of database items in large databases.
Large databases are becoming commonplace. The INSPEC database on Dialog had almost 4 million records in one file as of November 1990, where records represent scientific articles, books, and papers. The Institute for Scientific Information maintains a database called SCISEARCH, where one file had almost 11 million records as of July 1991. The American Business Directory database had information on over 10 million companies in May 1995. The claims/U.S. patents database provides access to over 2 million patents. Countless other private databases exist, storing information such as employee and student records, addresses, customer profiles, and household buying habits.
Even information not stored as a conventional database can have database-like qualities. For example, individual Web pages contain some information; the aggregation of many Web pages contains a large amount of information. Relationships between Web pages are not explicitly stored as in a database, but can be inferred from referencing among the pages. Other examples include financial transactions, not stored as a conventional database but nonetheless representing a large volume of information, with each item of information potentially related to many other items.
Searching and retrieval systems operating with large databases generally allow retrieval of individual items, or retrieval of sets of items related in some way. For example, some databases allow retrieval of individual items. Other databases allow searching for items containing certain keywords or topical markers. The large size of the database makes it more likely that a user can successfully find and retrieve the desired items.
The large size of the database, however, also makes it less likely that the user can comprehend the relationships among the many items in the database. The user can find individual items, and can find groups of related items. The user can not, however, access the structure of the relationships among the items.
The structure of the relationships among items can convey much useful information. For example, a lawyer can use Shepards to find a linear chain of related cases, but can not see beyond that chain to deduce how the cases relate to other such chains. Other lines of reasoning and rules of law in different areas can grow from a line of cases, or a line of cases can itself grow out of several preceding themes in the law. While the relationships among cases are usually explicit through case citations, the structure of the relationships can not be understood using existing search and retrieval tools.
As another example, scientific papers represent the state of research, and often have explicit relationships to other papers through references. Bibliographies and citation lists can help illuminate relationships in a specific area, but are not sufficient to illuminate the ways fields of research grow together, build on each other, or spawn new fields over time.
For databases containing only a few items, a user can read items, analyze relationships, and draw diagrams to deduce the relationships. Databases with more than a few items have much more information embedded in the relationships, but the relationships are too many and too complex for a user to analyze or comprehend from existing search and retrieval tools. Consequently, there is a need for a process that allows a user to comprehend the structure of relationships among items in databases having many items.