Technical Field
The invention relates to data stores, and in particular to large scale distributed data stores.
Description of the Prior Art
In recent years, the need for reliable and efficient storage of large amounts of data has increased dramatically. Indeed the need for extensive storage has even outpaced the remarkable advances in storage technology, e.g. the increasing capacity and decreasing cost of hard disks, and processing power. As a result, storage of truly large amounts of data on a single server is in many cases impractical, and a distributed approach is desirable. Furthermore, even in those cases where storage can be handled by a single machine, a distributed data store may offer superior reliability and more efficient load handling.
At a recent conference of database users, Adam Bosworth envisioned a massive distributed data store through which “anyone can get to any data anywhere anytime”. (Database Requirements in the Age of Scalable Services; O'Reilly MySQL Users Conference; Santa Clara, Calif.; (Apr. 18-21, 2005)). The data store would be completely scalable, use inexpensive, commodity components, and be capable of handling billions of transactions a day. Most importantly, it would allow even novice computer users to serve structured data to a world wide community of users conveniently, leading to an explosion of accessible information.
Such a data store would be to a querying client what the World Wide Web is to a browser. As such, it would share many characteristics with the World Wide Web. In particular, the data store would incorporate:    Partitioning. Data storage and retrieval and the associated workloads would be distributed across many nodes within a storage network.    Caching. Data would be stored locally on a temporary basis to minimize the effects of surges in requests for particular items within the data store.    Stateless Nodes. Responding to queries would require a node to maintain only a minimum amount of state information.    Coarse-Grained Interactions. Clients, servers, and other network nodes would interact relatively infrequently, sending, receiving, and operating upon chunks of data at a time.
To date, substantial effort has been directed at developing very large databases (VLDBs) that exhibit some of these characteristics. Notable examples include the Mariposa wide-area distributed database (Michael Stonebreaker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu; Mariposa: a wide area distributed database system, VLDB Journal; 5(1):48-63; (1996)) and the PIER massively distributed query engine (Ryan Huebsch, Joseph M Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica; Querying the Internet with PIER; Proceedings of the 29th VLDB Conference; Berlin, Germany; (2003)).
Many VLDB systems, including the PIER query engine, are based upon content addressable networks (CANs). A CAN is based upon a multi-dimensional Cartesian coordinate space that spans the entirety of the data to be stored. Each computer that stores data is considered as a node within a graph. As a new node is added to the graph, it is dynamically assigned a subspace of the Cartesian coordinate space within the subspace of the node to which it is connected. The newly added node is henceforth responsible for storing and returning data items within the assigned subspace.
Inserting data into a distributed system and searching for data within, i.e. submitting queries to, a distributed system present well known challenges. The difficulty of responding to such requests may be simplified greatly if the algorithms used allow each node to consider only local information in responding to the request. In the case of a CAN, each node stores information indicating the subspace assigned to each of its nearest neighbors within the graph, and a greedy forwarding algorithm is used to propagate insert and query requests. Specifically, a node passes each request to the neighboring node with the assigned subspace closest to the location of the inserted or requested item within the Cartesian space. Each node thus acts upon purely local information, and does not need to coordinate with neighboring nodes to make routing decisions.
Nonetheless, difficulties may arise in distributed data stores when responding to search requests. Specifically, if a query associated with a search is propagated in an outwardly fanning manner to a number of nodes within the data store, and the data returned from the nodes are aggregated along the return path to the user, a bottleneck may ensue. This effect is particularly acute for those queries that entail the return of large amounts of data. As a data store grows larger and stores more data, this is increasingly the case.
It would therefore be desirable to provide a system for responding to queries of a distributed data store that minimizes the likelihood of a bottleneck should the act of responding to a query entail the return of large amounts of data. Furthermore, it would be desirable for the system to operate in a manner that is consistent with the goal of stateless nodes. Finally, the system should not require a user to anticipate a potential bottleneck but, rather, function in an automatic manner.