1. Field of Invention
The present invention relates generally to the field of distributed data indexing. More specifically, the present invention is related to an incentive mechanism for autonomously and selectively indexing distributed data and for opportunistically routing queries to such data.
2. Discussion of Prior Art
There is increasing interest in integrating data and computing resources across large-scale grids. A fundamental requirement for integration is the efficient discovery of data sources and computing resources across a distributed system. In current federation and consolidation approaches, users specify sources from which they wish to draw data by explicitly combining references to these sources in a query. Such an approach is not scalable because a query must be formed with an understanding of the contents of data sources within a relevant grid. Additionally, consolidation and federation approaches are unable to adequately provide for dynamic environments; each time a data source enters a grid, experiences a failure, or leaves the grid, overlying applications are affected. Thus, it is necessary to maintain an index of computing resources across a distributed system in order to more efficiently access data associated with these computing resources.
State-of-the-art approaches in distributed indexing schemes fall into two classes, a peer-to-peer scheme and a Lightweight Directory Access Protocol (LDAP). Current peer-to-peer research focuses on distributed hash tables (DHTs) as proposed by Stoica et al. in “Chord: A scalable peer-to-peer lookup service for internet applications” and Ratnaswamy, et al. in “A scalable content addressable network”. Each proposes to hash data objects to a common address-space and form an overlay structure by each peer tracking a selected number of other peers in the system. Because data is distributed uniformly across peers, a DHT offers an average time, logarithmic with the number of peers in the network, to locate a particular data item. The approaches proposed by Stoica and Ratnaswamy are limited in that it primarily only applicable to equality predicates.
Additionally, DHTs assume a cooperative model in which peers are willing to locally store data from other peers and index data that they themselves do not necessarily need. A cooperative model is less applicable for grids involving autonomous entities as is empirically illustrated in non-patent literature by: Adar and Huberman in “Free Riding on Gnutella”, Ripeanu et al. in “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-peer Systems and Implications for System Design”, and Saroiu, et al. in “A Measurement Study of Peer-to-Peer File Sharing Systems”. Empirical data from deployed systems such as Gnutella and Kazaa show a relatively large quantity of number of “free-riders”, peers who consume more resources than they contribute and peers who contribute nothing at all.
Further limiting is the randomizing nature of DHT approaches, which are designed to best accommodate uniform query access patterns. However, autonomous grids are prone to access locality. For example, a hospital cancer database in San Jose may predominantly make search requests for cancer patients in the San Francisco Bay area. However, if such a database is indexed with a DHT, it is necessary to maintain pointers to a random set of patient records, many of which may be irrelevant to a common local search pattern. Furthermore, there exists no mechanism to prioritize particular types of search requests; for example a hospital may desire preferential treatment for queries made by doctors over queries made by residents and interns.
Hierarchical LDAP directory structure approaches are also limited by a cooperative model assumption and lack of prioritization mechanism. Conceptually, range predicates can be provided and the randomized nature of indexed data can be accounted for, if an LDAP index structure is chosen with care. However, a key limitation of LDAP lies in that an appropriate index structure must be configured statically, and therefore may not match a given query workload. For instance, a database administrator may have configured an LDAP hierarchy of patient records organized first by geographic region, followed by disease, followed by ethnicity, etc. Such a hierarchy is of no support to query predicates having a different set of attributes, for example, age and symptom.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.