The invention relates to computer systems, and more particularly to a method and mechanism for more efficiently processing requests for data in a computer system.
Many computer systems utilize servers, such as “database servers”, to store and maintain information. In a client-server computer system model (or a multi-tiered computer architecture), users that wish to access or modify information at the server are often located at a “client”. To facilitate the explanation of the invention, the terms “database server” and “database client” may be used in this document in place of “server” and “client”; however, the invention is not limited in its applicability to database systems, and indeed, can be utilized in many other types of computer systems.
In client-server systems, commands are submitted to the database server to store, modify, or retrieve data. In response to the commands, data manipulation or query activities are performed at the database server, with data results returned back to the database client for access. In networked environments, the database server often performs data manipulation or query commands submitted by remotely located clients. The client may establish a direct connection to the database server over the network, or may establish a connection through one or more intervening system components, such as an application server or transaction processing monitor. In either case, the database server processes the user commands and generates appropriate data outputs to be returned to the client. For example, a common database function is to perform data queries using a query language such as SQL. The database server receives each query and generates a query result that satisfies the criteria defined by a particular query. The query result is subsequently transferred to the database client from which the query originated.
Inefficiencies may occur during the processing and transmission of data between the database server and client. For example, assume the database server produces a result set composed of a quantity of data that can be sent to a database client. The user may initially place an explicit request to transmit a first portion of that result set from the database server to the client, causing a first set of overhead, such as “network roundtrip overhead”, to be expended. At a later time, the user may request a second portion of the result set to be transmitted to the client, resulting in another set of overhead to be expended. This process may proceed until all of the result set is sent, resulting in multiple sets of roundtrip overhead to be expended between the database server and the client. The expense of sending the data in response to multiple requests also includes the wait time that is expended while the user waits for the request to be sent to the database server and for the subsequent data to be sent back to the client. In addition, if the transmitted data are broken into smaller pieces than the optimum data transfer sizes for the system, additional overhead is expended. Thus, the more pieces that the data set is broken into before transmissions from the server to the client, the greater the overhead that is likely to be expended.
Another type of inefficiency that may occur is the retransmission of data in the returned result set. If the data to be sent to the client contains redundancies, then excess overhead, such as increased transmission time and data storage at the client, is expended by the system to transmit and store that redundant data. When the amount of data redundancies is sufficiently large, the excess overhead can have a serious effect upon system performance. Such a circumstance may occur, for example, if the client is querying large database tables with sparse data for performing queries with joins involving wide tables.
These same types of inefficiencies may exist for data transmissions between two servers. This may occur, for example during a remote-mapped query. A remote-mapped query includes a query in which data that is accessed to respond to the query exists at a remote location. To process the query, a first server may need to query data that is located at a second server. In effect, the first server becomes a “client” to the second server. As a result, data transmissions will occur between the second server and the first server. The same issues stated above with respect to excessive network roundtrips and data redundancy may also occur for these types of data transmissions between two servers.
Embodiments of the present invention are directed to a method and mechanism for reducing the expense of data transmissions between two computing nodes. According to an embodiment, data prefetching can be utilized to predictably retrieve information between a first server node and a second server node. Data redundancy management can be used to reduce the expense of transmitting and storing redundant data between the first server node and the second server node. In an embodiment, data prefetching and/or redundancy management are used to increase efficiency for processing distributed database queries, such as those involving remote-mapped queries. In yet another embodiment, data prefetching and/or redundancy management are used to increase efficiency for processing distributed join operations. Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims.