The collection, maintenance, and distribution of electronic data is typically performed by a database management system (DBMS). Storage of data, be it paper, punch card, or electronic, is typically premised on the goal of providing fast and accurate retrieval of electronic data, i.e. distribution. The ability of the DBMS to collect and maintain electronic data heavily influences its potential to provide fast and accurate retrieval. In short, if errors are made in the design of the database, the ability of the database to provide fast and accurate retrieval of electronic data is hampered. To illustrate, an index card cabinet designed to store 3″ by 5″ index cards will be ineffective in storing data maintained on 4″×6″ index cards. Thus, quick and accurate retrieval of data is prevented due to the improper collection (the card size) and maintenance (the card cabinet) of the system.
The traditional DBMS requires a measure of rigidity in its initial design. This rigidity is imposed upon the incoming data using the DBMS structure. The rigid structure provides a reference set of information containing data element names, sizes, characteristics, etc., hereinafter referred to as metadata.
The Column/Row model is a classic example of the metadata rigidity seen with the traditional DBMS. In this model, each row represents a single “record” of information, while each column represents a specific piece (i.e. “field”) of data for that record. To illustrate, a database of homeowners could have columns for NAME, ADDRESS, CITY, STATE, and ZIPCODE fields for each record. The result is a Column/Row model that allows for basic isolation of specific fields through the unique Column/Row coordinate point, where Column 1, Row 1 contains the NAME field of the first record's homeowner data.
While the rigidity of a predefined DBMS structure provides unique coordinate points for all fields in all records, there is a price to be paid, namely, flexibility. Using the Homeowner Database example above, the single field NAME may be insufficient for some uses, such as identification of last name, first name, or middle initial. Since there is only one NAME field, no unique column exists for these sub values of a homeowner's name, such that this information must be stored in the single NAME field. While the last name data could be extracted from the NAME field, additional effort is required to do so, producing inconsistent results especially if the data in the NAME field is not entered the same manner for each record (i.e. last name first, first name last). The additional effort required to identify the data and the potential for inconsistent results makes the traditional DBMS unable to supply fast and accurate retrieval of electronic data under certain circumstances.
Referring to the homeowner example, a DBMS designed without a column for entry of phone number data will result in the loss of efficiency. Specifically, without a phone number column in the database, phone number data cannot be stored unless it is stored in one of the existing fields, thus breaking the logic of that column's data identification.
Traditional database management systems are capable of increasing searching speed by placing data into specifically ordered lists of fields or field combinations within the database, commonly known as indexes. As new records are added, these indexes must be updated to insert the new data into the ordered sequence.
Indexes may be predefined during the initial design of the DBMS's rigid structure or created on demand. Unfortunately, traditional indexes require large amounts of storage space for creation and maintenance due to the duplication of data elements that are subject to the index. Like the data structure, indexes are designed to conform with anticipated user requests for data retrieval. Accordingly, traditional DBMS indexes are not created for non-anticipated requests. Referring to the homeowner example, a database designer would not predefine an index for the address field because this field is likely to contain duplicate street names and numbers. More likely, a database designer will predefine an index for the name field because this field is more likely to contain non-duplicate data. As a result, a user searching for an address data field will be forced to endure a row-by-row search without the assistance of an index.
The rigid structure of the traditional DBMS imposes restrictions on the size and type of data that can be stored in each field. Referring to the Homeowner example, if the NAME field is defined as being 25 alphabetic characters in length, any name longer than 25 characters would be truncated, and a hyphenated name may not be stored at all since the hyphen is not within the alphabetic character set. Some DBMS systems allow for some variability in a field's metadata value for size should it be defined for that purpose, but only as character information. Numeric, date or time values cannot be stored in these variably sized fields and still retain their numeric, date, or time characteristics.
There remains a need for an improved database management system capable of importing data of any type, analyzing the data, supplying metadata information, searching the database contents, and retrieving data fields, records, or entire original data files.