During the 1970's, IBM Corporation developed a database language called SEQUEL (Structured English QUEry Language) that could be used for retrieving and manipulating data in a relational database. A subset of this language, called SQL (for Structured Query Language) has become an accepted standard for querying, changing, and manipulating data in such databases. Its primary use is in formulating interactive queries and in handling data based on simple programmed instructions. However, SQL is restricted to use with databases that are of First Normal Form (1FN) in which only atomic (i.e., non-decomposable) valued domains are allowed, as discussed by E. Codd in "A relational Mode for Large Shared Data Banks," Communications of the ACM Transactions on Database Systems 13 (6), pp. 377-387 (1970) and by J. Ullman in "Principles of Database Systems," second edition, Computer Science Press (1982). Under conventional techniques, SQL has not been implemented to access data files stored in a file structure that is of a Non-First Normal Form (NF.sup.2).
Perhaps the most common NF.sup.2 data file structure in the business world is that created using COBOL, an acronym for COmmon Business-Oriented Language. Use of COBOL and the data files produced by it have become particularly prevalent in business because the programming language has a verbose, English-like syntax and because it was required that contractors working with the U.S. Department of Defense standardize on COBOL for data management activities relating to government work. COBOL, which is a procedural programming language that is compiled before execution, is divided into four different parts, including: (1) Identification, (2) Procedures, (3) Environment, and, (4) Data. The Data portion of the language is based on a hierarchical data structure that is of NF.sup.2. An understanding of certain aspects of the COBOL data structure is important in order to appreciate why conventional SQL is unable to extract data from a COBOL data file.
A COBOL data file comprises a set of items, each of which has its own description. There are three types of items in COBOL, including elementary, group, and array items. An elementary item is a subdivision of a record that cannot be further subdivided; a group item comprises a named sequence of one or more elementary items or group items; and, an array comprises a table that defines homogeneous sets of repeated data items. The organization of COBOL data files is based on the "level" of the elementary items and group items comprising each data file. As will be even more evident from an example presented below, the data structure employed by COBOL is very different from the data structure of conventional relational databases, which are restricted by the limitations of 1NF. In this sense, SQL cannot directly retrieve data from COBOL data files. However, when a user needs to embed SQL statements in COBOL programs, they have to first normalize the data format to the 1NF, which is substantially different from that of the standard COBOL data files.
It is therefore desirable to enhance the conventional SQL so that it can be used to retrieve and manipulate COBOL data files. More importantly, such an extended SQL should allow an operator to describe and access COBOL data files directly, combining the non-procedural language of SQL with the procedural programming language of COBOL in a natural and seamless fashion. These and other benefits and advantages are provided by the present invention, COBOL Compatible Structured Query Language (CCSQL), which represents a novel development and application of the extended relational algebra and calculus theory presented by M. Roth, H. Korth, and A Silberschatz in "Extended Algebra and Calculus for NF.sup.2 Relational Databases," ACM Transactions on Database Systems 13 (4) pp. 389-417 (1988).