Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query. In addition, a reduced reliance on runtime interpretation of queries in favor of increased usage of directly-executable program code has improved query engine performance.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
It has been found that one area of inefficiency in many database management systems is due to the flow of data from a database server to a client, e.g., to retrieve tabular data (i.e., row and column data) requested by a client. As is well known in the art, a number of database management systems are distributed in nature, whereby a server, implemented using one or more computers, is coupled to a multitude of remote clients via a network. In many environments, for example, clients may be coupled to a server over a local area network, wide area network, or even a public network such as the Internet.
As with all distributed computing systems, client-server database management systems are often constrained by the bandwidth between the server and clients, as the transmission of data over a network is often several orders of magnitude slower than the transmission of data internally within a computer. As such, it is often desirable to minimize the latency associated with transmitting data over a network between a client and a server in a client-server database management system.
One manner of enhancing client-server database performance is referred to as “prefetch” support. With prefetch support, a memory buffer, referred to as a prefetch cache, is maintained on a client computer, and data is transmitted from a server to a client in the form of blocks that contain, in addition to data requested by a client, additional data that will likely be requested later by the client. In some environments, a first block of data may even be prefetched before any data is actually requested by the client, e.g., during initial execution of a database query, and prior to the actual retrieval and analysis of the data by the client.
As an example of the operation of a prefetch cache, consider where a client requests all of the rows in a particular table. By virtue of prefetching, once the client requests the first row, the database may return a block of rows to the prefetch cache, such that, once a second row is requested by the client, that row is already available in the prefetch cache.
Even in prefetch-enabled systems, the flow of tabular data from a server to a client is often inefficient, which can needlessly waste valuable bandwidth, and thus hinder database performance. Such inefficiencies typically take the form of transmitted data that is never actually used by the client. Often, the inefficiencies are as a result of careless coding by application developers, database access code generators that lack knowledge of what an application using them will really do, and database access optimizations that are applied too early in the execution process and never re-evaluated.
As one example, a common statement utilized in many database queries is a “SELECT *” statement. A SELECT statement is a common structured query language (SQL) statement utilized in SQL-compatible relational databases to request the retrieval of one or more columns of data from one or more rows in a table. As is well known in the art, relational databases organize data into tables consisting of rows and columns, with each row being analogous to a “record” in a non-relational database, and each column being analogous to a field in a record.
The “SELECT *” notation is often used by application coders to request all columns from all rows be returned from a particular table. In many instances, the “SELECT *” statement is used even though comparatively few of the columns in each row are actually used by later statements. However, because the statement requests all the columns, the software that interacts with the database is typically required to fetch all of the columns for each requested row from the database. As a result, a significant amount of data that is never used by the client may be transferred from the server to the client in response to such a query.
The software that handles the interaction between a database and the clients that utilize the database often takes many forms. For example, one structured programming interface that permits clients, and typically client applications residing on client computers to access a particular database, is a distributed database driver such as a Java Database Connectivity (JDBC) database driver. With a JDBC driver architecture, typically separate client and server portions of the driver are resident on a client and server computer, and are used to handle both the transmission of requests from the client to the server, as well as the transfer of results back to the client from the server.
Many database drivers support functionality such as prefetching, connection pooling, statement pooling and other enhancements to improve performance; however, conventional database drivers are typically required to obtain all data requested by a client application. Put another way, statements generated by clients are required to be processed by returning all of the data requested thereby. Thus, any enhancements in performance available via conventional database drivers can be offset by inefficiencies in application coding that requests unnecessary data.
As another example of an inefficiency in a database management system due to the transfer of unnecessary data, in many databases, unless a query specifically restricts the maximum number of rows to return in response to the query, typically every block will be filled with as many rows of data that can fit in the block, even if only a few rows will ultimately be used by the client application. In some situations, for example, data may be returned based upon a predetermined sort order, whereby the client application may only need information from the first few rows. However, by filling a block completely with a full compliment of rows, a significant amount of unnecessary data may nonetheless be transferred from the server to the client.
Therefore, a significant need exists in the art for a manner of optimizing the transfer of data between a server and a client in a client-server database management system, in particular to minimize the amount of unnecessary data transferred therebetween.