As the use of computers and the Internet become more widespread, especially amongst the new generation, on the one hand, new data is generated at a faster rate, and on the other hand, the need for efficient search engines to access data stored in various databases become more eminent. People use search tools such as GOOGLE and YAHOO to learn about the latest news, to find items, to look up addresses, to reserve tickets, to find answers to many of their questions and the like. One of the areas in which databases are playing a major role is in ecommerce. Everyday, millions of people participate in some kind of electronic business where they not only use databases, but they also contribute to the data existing in those sources.
With the increase in volume of data, more sophisticated data management techniques have to replace rudimentary methods. As a database application expands to service millions of global customers for example, scale-out architectures may need to replace hosting large databases on a single mainframe-class machine.
Several approaches to scale-out are well-known in the art. An information repository may be horizontally partitioned by dividing it into several segments, each storing data related to a specific category of information (e.g., customer data, inventory data, employee data, and so on). Data may also be stored in so-called rule-based servers. In a rule based server, the server has to verify whether a service request meets certain application-specific criteria before forwarding the request to service routines (e.g., making sure a student is registered before allowing the student access to a university online library.) In distributed data sources, data stored on several servers may be accessed by distributed applications consisting of one or more local or remote clients that are connected to one or more servers on several machines linked via a network. In this distributed application scheme, the address of the request may be embedded in the data, so that the data identifies the server that may fulfill the request.
In all approaches to scale out e.g. horizontally partitioned data, rule based databases and distributed applications, a method called Data Dependent Routing (DDR) may be used to partition data and access data across multiple sources. DDR may require sufficient intelligence in the client application to route the database request to the appropriate server. With DDR, each federated server may be independent with no view of other servers, except for the sharing of the database schema. The client application in a DDR contains mappings to how the data is partitioned and at which server the requested data may exist.