1. Field of the Invention
The present invention generally relates to processing data and more particularly to enhancing the performance of a client through the utilization of statistical information regarding the distribution of requested data to be processed by the client.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., a client or client application) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
Generally, the DBMS includes a query optimizer component configured to determine the manner in which queries will be processed. The primary task of the optimizer is to determine the most efficient way to execute each particular query against a database. To this end, the optimizer typically determines an access plan for use in executing the query against the database. In general, the access plan contains low-level information indicating precisely what steps the system is to take to execute the query. Commonly, the access plan calls for the use of one or more indexes carefully designed to speed execution of the query. Database indexes provide a relatively quick method of locating data of interest without a full sequential search through the table, which would entail accessing each row.
In general, indexes provide statistical information regarding distribution of the data within a particular database field, such as a particular column of a relational database. Examples of the type of statistical information provided by an index include the number of distinct values stored in a column and the number of occurrences of each distinct value within the column. The optimizer may use this statistical information to decide whether to use indexes and/or which indexes to use. For example, the index may indicate that a relatively small number of distinct values occur in a column of a table having a relative large number of rows. Therefore, the optimizer may access the index to determine which rows has a requested value. In such a case, the use of indexes typically results in a considerable reduction in the total number of I/O requests that must be processed in order to locate the requested data. On the other hand, if the number of distinct values is large relative to the total number of rows, a full sequential search for the requested data may be more efficient.
Commonly, a client requesting the data must perform a number of formatting operations after receiving the requested data. For example, the client may be required to convert a field from one format (e.g., a string of characters) to another format (e.g., an integer) for use in a particular operations performed by the client. In some cases, the statistical information contained in indexes may be used to enhance the performance of such formatting operations. For example, if a column contains a relatively small number of distinct values (as in the case described above), the client may store locally (e.g., in a cache) each of the distinct values in the converted format. Rather than perform the formatting operations each time one of the values is retrieved from the column, the client may simply retrieve the formatted value from the cache, thus reducing the processing overhead of formatting (creating a new object to hold the formatted value, converting the data, etc.).
However, conventional use of indexes has been limited to optimizing access of data from the database and does not extend to the requesting client. Accordingly, there is a need for an improved method of enhancing performance of a requesting client through the use of statistical information, such as that commonly contained in database indexes.