1. Field of the Invention
This invention relates in general to a method of extracting the hierarchial data structure contained in computer memory for computers operating under ANSI-92 SQL2 outer join protocol.
2. Background of the Invention
Computers have become essential in our modern society for the storage, processing and retrieval of data stored in computer memories. Accordingly, much current research in computer technology is directed to methods designed for the storing and processing of data that are inter-related. These groupings of stored data are commonly known as relational databases and are accessed by the computer's central processing unit operating under a set of instructions such as the ANSI-92 SQL2 protocol and grammar set. The power of such relational databases lies in their flexibility to store data as separate normalized tables which are free to be related in any way necessary for each application accessing the data held in the computer's storage. The relational join operation is the mechanism used in SQL relational databases to perform the relating and combining of multiple tables into a single result table or result set for additional processing by the central processing unit by additional relational operation or the application.
The method of associating stored data and determining the interrelationships thereamong is therefore one of the most important relational operations capable of being performed by a processor on stored data.
However, current methods of building or generating the data structure, such as that known as a "inner join" in SQL protocols, have resulted in problems such as lost data, data redundancy, lack of data modeling capability and loss of data structure for such stored data. These problems cause inefficiency and inaccuracy in the use of the computer system's available storage space for holding and processing data stored in the computer's memory.
The first problem, that of lost data, is caused by the way the standard SQL inner join protocol processes unmatched rows of stored data in the computer memory. For a row of a participating table in a join operation to be included in the result, it must be matched with at least one row from each of the other participating tables of data stored in the computer. This means that one occurrence of a missing row from any one table can result in lost data from all the other tables participating in the inner join operation.
Another serious problem is that of data redundancy where data being stored in the computer memory is duplicated in several locations lessening the overall accuracy of the computer system to store data. There are two main causes for redundant data. The first occurs when the resulting table or working set, produced from joining multiple tables, is forced back into a flat, two-dimensional table result. When a row from one table is matched with multiple rows in another participating table, the single matching row must be replicated to match the multiple rows from the participating table. This will force the resulting table or working set back into a flat table structure.
Flat table structures are necessary to comply with conventional relational databases' first normal form requirement. The first normal form requirement requires one, and only one, occurrence of each data field in every row of a table or result to be present.
The second cause of redundant data occurs when it is generated by a Cartesian product effect (data explosion). In this case, two or more rows with the same join field values from one table are joined with two or more rows of another table. Since the join field values are the same, each row from one table is joined with each matching row from the other table resulting in all combinations being joined and placed in the result.
The last two join related problems, i.e., lack of data modeling and loss of data structure, occur when the same tables joined using the same interrelationships can be modeled or viewed in more than one way. These last two problems can result in serious consequences in causing the data stored in the memory unit to become inaccurate for processing and retrieval under the desired data structure under which it was stored for processing by the central processing unit.
Because of the non-procedural and powerful nature of SQL, hierarchical data structures are the most useful since they have the greatest flexibility while having unambiguous semantics. This is because there is only one path to each table.