1. Field of Invention
The present invention relates in general to the digital data processing field and, in particular, to database management systems. More particularly, the present invention relates to a mechanism for dynamically allocating space for a fixed length part of a variable length field (e.g., a VARCHAR field) in a database table.
2. Background Art
In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises at least one central processing unit (CPU) and supporting hardware, such as communications buses and memory, necessary to store, retrieve and transfer information. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which comprise a computer program and direct the operation of the other system components.
The overall speed of a computer system is typically improved by increasing parallelism, and specifically, by employing multiple CPUs (also referred to as processors). The modest cost of individual processors packaged on integrated circuit chips has made multiprocessor systems practical, although such multiple processors add more layers of complexity to a system.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, using software having enhanced function, along with faster hardware.
The overall value or worth of a computer system depends largely upon how well the computer system stores, manipulates, and analyzes data. One mechanism for managing data is called a database management system (DBMS), which is a computer program that is used to access the information stored in a database. Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications.
At a most basic level, a database stores data as a series of logical tables. Each table is made up of rows and columns. Each table has a unique name within the database and each column has a unique name within the particular table.
As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information stored in a database and the number of users wishing to access the database increases, the amount of computing resources required to manage such a database increases as well.
Database management systems therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, of databases for information. Different statements called queries allow the user or an application program to obtain data from the database. As one might imagine, queries range from being very simple to very complex.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given amount of time. Furthermore, the use of multiple processors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as SQL (Structured Query Language), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in a manner to minimize the amount of resources required to execute each query.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
A table is a logical structure maintained by the database management system. Tables are made up of columns and rows. There is no inherent order of the rows within a table. At the intersection of every column and row is a specific data item called a value. A column, or a field, is a set of values of the same data type. A row is a sequence of values such that the nth value is a value of the nth column of the table. An index is a set of pointers to a table that has an entry for each record of the table. This entry is dependent on the value of that record in one or more columns of the table.
Variable character data types (often referred to as a “VARCHAR” fields) allow, in many instances, up to 32K characters of a character string to be stored. One common activity which database users frequently perform on such character strings is to search the text in the string for a particular matching sub-string using the LIKE predicate. An example SQL statement might resemble:
SELECT LastName FROM Customers
WHERE Lastname LIKE ‘Mar %’
This query would return all the last names of all the records in the table that start with the letters “Mar”.
When a database receives a query, the database interprets the query and determines what internal steps are necessary to satisfy the query. These internal steps may include identification of the table or tables specified in the query, the row or rows selected in the query, and other information such as whether to use an existing index, whether to build a temporary index, whether to use a temporary file to execute a sort, and/or the order in which the tables are to be joined together to satisfy the query. When taken together, these internal steps are typically referred to as an access plan (AP), although they are sometimes referred to as an execution plan. The access plan is typically created by a software component that is often called a query optimizer.
Allocating the appropriate amount of space for VARCHAR fields in a database table is a tricky task. Typically, a static/variable space allocation approach is used for VARCHAR fields. In such a static/variable space allocation approach, the system allocates for each record in a VARCHAR field a fixed, or static, amount of space (i.e., a “fixed length part”), plus a pointer for the rest (i.e., a “variable length part”). Thus, each record in a VARCHAR field has two parts—the first being the fixed length part and the second being the remaining variable length part. In addition, each record in a VARCHAR field has a pointer that allows the DBMS to put the two parts together. So, in a table with a single VARCHAR column with five records, each record has the fixed length part, a pointer to a variable length part, and a variable length part (which could be a different length for each of the records). The amount of space allocated for the fixed length part is fixed by a system setting. The static nature of the fixed length part often makes the static/variable space allocation approach inefficient. The static amount of space allocated by the system for the fixed length part is typically made relatively small to avoid wasting database storage space. However, by making the static amount of space allocated by the system for the fixed length part relatively small, for a given VARCHAR field the system is more likely to be burdened with the process of following pointers, which slows the speed at which the data may be accessed.
Therefore, a need exists for an enhanced mechanism for dynamically allocating space for a fixed length part of a variable length field in a database table.