This invention relates to databases. In particular, this invention relates to an improved database storage structure.
A database is a collection of interrelated data, typically stored on a computer, to serve multiple applications. Data in a database is logically represented as a collection of one or more tables, composed of a series of rows and columns. Each row in the table, called a record, represents a collection of related data. Each column in the table, called a field, represents a particular type of data. Thus, each row is composed of a series of data values, one data value from each field.
A database is organized from two perspectives. A logical perspective describes how the database is organized from a user's viewpoint. A physical perspective describes how data is actually recorded in computer storage. The prior art describes various techniques designed to alter the physical organization of a database while maintaining the same logical perspective, in order to reduce computer storage requirements. One technique, data compression, is well-known in the prior art. Peter Alsberg's paper, "Space and Time Savings Through Large Database Compression and Dynamic Restructuring," Proceedings of the IEEE, Vol. 63, No. 8, August 1975, pp. 1114-1122, describes the use of binary number codes to represent data consisting of character strings. Alsberg noted that when specific data values occur repeatedly, it is feasible to use a variable-length compression code for that field. Using shorter codes for the frequently occurring data values and longer codes for the infrequently occurring data values achieves greater compression. The paper by Dennis Severance, "A Practitioner's Guide to Database Compression," Information Systems, Vol. 8, No. 1, 1983, pp. 51-62, describes a similar binary encoding compression scheme. Severance outlines a method where the data values are ordered by probability of occurrence and then assigned a variable-bit-length binary code using Huffman coding, a well-known optimum code for this purpose.
Besides data compression, pattern recognition can also be used to reduce the data storage requirements of a database without altering the logical organization of the database. In Fred McFadden's book, "Database Management," 1983, Benjamin/Cummings Publishing Company, a technique called pattern substitution is described. This technique first identifies repeating sequences of characters that occur within a particular field, then replaces these sequences of characters by a single character which represents the pattern.
Besides reducing data storage size, other techniques for altering the physical organization of a database to allow faster data access have been described. For example, data records may be grouped in a way which allows data items which are accessed more frequently to be stored on the fastest storage devices. This can be achieved by splitting the stored records into separate segments and allocating separate segments to separate physical storage devices, some of which permit faster data access than others. As another example, records can be physically grouped together if they are frequently accessed together, such as grouping records on the same disk sector or disk track. In this manner, fewer disk accesses are needed to transfer data to or from the main computer memory for a particular application. This technique is described in McFadden's book, "Database Management," cited above.
The prior art also describes techniques for altering the logical organization of a database in order to create a more efficient physical organization. As an example, W. Kent, in his paper, "Choices in Practical Data Design," Proceedings: Very Large Data Bases, Sep. 8-10, 1982, pp. 165-180, describes field design options. Kent describes alternative representations of the same data relationships. Each specific data type can be represented by a separate field or grouped with other data types into a combined field. In Kent's article, data types are either combined into a single field or separated into distinct fields based on the known relationships between the data types. For example, "date" can be a single field, or date information can be represented by three separate fields: "month," "day" and "year."