This invention relates generally to the fields of knowledge visualization, peer-to-peer networking, and data mining and information extraction. More particularly, this invention relates to a computing system, and associated methods, for generating and storing complex relational data capable of being represented as a network, from an input set of data (“records” herein) in a distributed manner. Aspects of the system also provide methods for providing a user with persistent access to the relational data in a distributed computing environment.
Knowledge visualization is a field of endeavor that is devoted to developing tools and techniques for graphically representing information in order to assist a human in having a deeper insight into patterns or relationships that exist within the information. Knowledge visualization is applicable in a variety of disciplines in the sciences and business world. The information, which may consist of relational data, may be represented graphically in the form a “network” or “sub-network”.
Attention is directed to FIG. 1 for an example of knowledge visualization of relational data in the form of a network. FIG. 1 is a representation of relational data in the form of a collection of sub-networks 10. In the example of FIG. 1, the input records consist of publications. The vertices 12, 14 in the sub-networks 10 comprise keywords (annotations) such as “enzyme inhibitor” 14A and “inflammation” 14B associated with input records. Different shapes for the vertices 12, 14 correspond to different categories those keywords belong to in the context of a given taxonomy. In this particular example, squares and triangles correspond to diseases and general biomedical entities, respectively. The links (lines) 16 joining the triangular vertices to the squares indicate that those particular keywords are contained in at least one common publication.
Knowledge visualization of relational data such as shown in FIG. 1 is an analytical tool for obtaining greater insight into patterns or relationships present in the input records. For example, if the researcher wanted to ascertain which keywords co-occur in the analyzed set of publications, they would consult the set of links in the graph; an external dedicated graph visualization module used in conjunction with this system might also support features allowing interactive access to micro-level information associated with individual records that establish specific relations in the network, which are visually rendered as links in the corresponding graph. Other examples might correspond to inhomogeneous networks where vertices correspond to keywords (authors) or publications; a link between a keyword (author) and a publication would correspond to that publication mentioning the keyword (being published by that author), while intra-keyword (-author) links would still correspond to co-occurrence (co-authorship). Details about specific meaning of vertices and links in the graph are not important and can vary widely in this invention.
FIG. 2 is another example of relational data in the form of a collection of sub-networks 10. In the example of FIG. 2, a new input record 18 consist of a publication; the vertices 20 in the sub-network 10 comprise authors of publications. Lines 16 joining vertices indicate that two publications share a common author. The input record 18 is mapped to the sub-network by constructing additional lines as needed to show the relatedness among the authors of the input record 18, some of which could possibly already be part of sub-networks 10. The greater the number of links a given author has to other authors, the larger the vertex. Hence, the largest vertices could be considered to be leading or prominent authors.
Knowledge visualization is described further in the following references, the contents of which are incorporated by reference herein: Chen C., “The centrality of pivotal points in the evolution of scientific networks”, Int'l Conf. on Intelligent User Interfaces (IUI 2005), San Diego, Calif. Jan. 9-12, 2005; Chen C., “Searching for intellectual turning points: Progressive Knowledge Domain Visualization”, Proceedings of the National Academy of Sciences of the United States of America (2004); Chen C., Kuljis J., “The rising landscape: a visual exploration of superstring revolutions in physics”, Journal of the American Society for Information Science and Technology, 54, 5, 435-446 (2003); Chen C., Paul, R. J., “Visualizing a knowledge domain's intellectual structure”, IEEE Computer, 34(3), 65-71 (2001); Haas L. M., “DiscoveryLink: a system for integrated access to life sciences data sources”, IBM Systems Journal, 40, 2, 489-511 (2001); Chen C., “Visualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries”, Information Processing & Management, 35(3), 401-420 (1999); Chen C., Carr L., “Trailblazing the Literature of Hypertext: Author Co-Citation Analysis”, Proceedings of the 10th ACM Conference on Hypertext and Hypermedia (1999). ISSI 2005—10th International Conference of the International Society for Scientometrics and Informetrics. Jul. 24-28, 2005, Stockholm, Sweden.
The patent literature includes several references devoted to graphical visualization techniques, including U.S. Pat. Nos. 5,313,571; 6,211,887; 5,611,035; 5,638,501; 5,949,432; 5,754,186 and 6,867,788.
The following recently issued U.S. Patents are of potential interest to aspects of the present inventive system, as either being directed to methods and systems for presenting information, or in the context of client/server systems in distributed computing environments, or in the context of distributed databases: U.S. Pat. Nos. 6,912,536; 6,912,607; 6,912,229; 6,910,053; 6,909,695, 6,912,588, 6,912,535 and 6,912,196.
Several U.S. patent publications are of interest to various aspects of the present disclosure, including 2005/0120137; 2005/0010618; 2004/0078466; 2004/0088297; 2003/0120672; 2003/0140051; 2003/0220960; 2003/0225884; 2003/0088544; 2003/0050924; 2002/0184451; 2001/0034795 and 2001/0051955. Of these references, U.S. patent publications 2004/0078466 and 2002/0184451 are in some ways the most relevant to the present system.