1. Field of the Invention
This invention relates to computerized databases and, more particularly, to a system and method of sorting data in a relational database.
2. Description of the Related Art
A database is used to store vast quantities of data for future retrieval upon request by a user. A user can be either an application program or an end user interacting with the database system through an input device. Related groups of data are commonly referred to as files of data, or tables, as commonly used in relational databases. The rows of data in a table are referred to as logical records, and the columns of data are referred to as fields. In a relational database system, the user perceives the data only as tables, and not in any other organizational form, e.g. a hierarchical structure of data.
These database systems typically include a computer program, commonly referred to as a database manager, for storing, editing, updating, inserting, deleting, and retrieving data in response to various commands entered through a user interface. A database manager handles all requests from users to the database to perform the various functions as listed above.
Specifically, with respect to retrieval of data, numerous computer languages were devised for formulating search commands or "queries" to which the database manager was responsive in providing the requested data. These queries were basically search instructions encoded so as to cause a computer and associated database manager to carry out the desired search.
Several problems have been associated with these languages for formulating database queries First, many of the query languages differed from conventional programming languages. The database user with programming experience was thus required to learn an entirely new set of commands just to get meaningful data out of the database Users without such experience, such as many terminal operators who are also without computer experience of any kind, were thus forced to learn a form of computer programming just to interact with the database Moreover, such query languages required knowledge of highly complex syntax and semantics rules, thereby further limiting the numbers who could successfully retrieve data to only a highly and expensively trained few. This, in turn, adversely affected the utility of such computer systems and seriously inhibited their use by a widespread number of individuals.
The structured query language (SQL) is an interactive query language for end-users to utilize to interface with the database manager to retrieve information, and a database programming language which can be embedded in an application program to access the data in the database through the database manager. SQL is an easy way to specify the type of information needed.
A representative of such query language is the Standard Query Language or "SQL" detailed in the Draft Proposal, ANA Database Language SQL, Standard X3.135-1986, American National Standard Institute, Inc., 1430 Broadway, New York, N.Y. 10018. A detailed discussion of SQL is also set forth in "IBM Database 2 SQL Reference" Document Number SC26-4346-3, IBM Corporation, both of which are incorporated herein by reference.
A major advantage of a relational database system is that instead of performing an explicit operation by the control of the user, the database receives inputs independently of the user that increases performance of retrieval of the data. All the user has to do is to specify the type of information that the user wants to retrieve from the database. For example, if a user wants information from two different tables of data, a relational database system will figure out how to retrieve the information from both tables of data in the most efficient way.
In database systems prior to relational database systems, the programmer controlled how the information was retrieved. In a relational database system, the system decides how to retrieve the data.
In many database processing systems, the sort function is an important part of the system. In some instances, half of the computing power of a database processing system can be utilized for the sort function alone. In a relational database, a user utilizes a sort command if the user wants the output ordered in a certain way. There is also an implicit sort which is used by the database system when the system decides that a sort function is needed to efficiently retrieve information for a user. For example, a relational database system utilizes a sort when two tables are to be joined together more efficiently.
A sort function is an expensive function in a processing system since it requires a lot of processing time to perform compare instructions that are necessary to the sort command. In addition, if a large amount of data is to be sorted, the database system will write an intermediate portion of the data being sorted to a disk or hard file. This portion of data is read back and further sorted
If all the data that is to be sorted can be stored at one time in the memory of the CPU, no external sort using an I/0 operation is needed. If all of the data can not be stored at one time in the memory of the CPU, a portion of that data is sorted, the result is stored to disk, and a portion of the sorted data on a hand file is read back into memory to be further sorted with another portion of unsorted data.
Writing and reading back from a file can occur multiple times during a sort operation. The time that it takes to perform an I/0 operation can be very costly in a sort operation.
Sort time performance is a critical competitive measure of relational database products for several reasons. First, the sort operation is one of the most frequently used operations in a relational database system. Second, improvements in query performances, such as those queries that utilize a sort operation, are very perceptible by the end user by the amount of time that a user has to wait for the result of the query. Therefore, sort performance in a relational database is an important competitive bench mark in database products.
For more background on relational databases and the SQL language, the following reference is herein incorporated by reference, Date, C. J. An Introduction to Database Systems, The Systems Programming Series, Volume 1, Fourth Edition, Addison-Wesley Publishing Company, 1986.