Punched card machines for inputting data to a computer are seldom used today, but the methods they employed to interrelate data files are widely employed in current computer data processing technology. Present day programming methods for interrelating data files are still subject to certain limitations of punched card processing.
Punched card machines were necessarily limited to sequential processing of input files by mechanical constraints imposed by physically reading decks of punched cards, one card at a time. A sorting machine and a collating machine were employed to interrelate two card files. Consider an example of an application having a punched card file of insurance policy records and a second punched card file of records of claims against insurance policies. In this example, the policy file is referred to as a “master file” and the claims file is referred to as a “detail file”. The records in the detail file are related to the records in the master file by a common key. The common key in this example is a policy number. Every detail record typically has only one master record to which the detail record belongs. In this example, each claim belongs to only one policy. Each master record has a unique common key value, for example, a unique policy number. Any exception to these requirements is considered an error condition.
In the above example, a card sorter was used to sort the master file on the policy number field contained in each of the records of the master file and to sort the detail claims file on the policy number field contained in each of its records. A collator was then used to collate the master file and the detail file. The collator had two input hoppers. One hopper fed the sorted master file and the other hopper fed the sorted detail file. The collator operated in accordance with wiring of a plug board control panel. The plug board was prewired by the installation and could be inserted into and removed from the collator. The collation of the two decks was controlled by the wiring of the plug board. In a typical master/detail application, the two files were collated so that in the resultant file, each master record was immediately followed by the detail records that contained the common key value of the master record. In this example, each policy card record would therefore be immediately followed by card records, if any, of the claims made against that policy. The collated file of the two record types would then be in order by policy number.
If the claims records had been sorted on their claim numbers before they were sorted on their policy numbers, the claims would be further ordered by the claim number within the policy number. This ordering could be useful, for example, in quickly finding a claim under a policy with many claims against the policy. Once the two types of records were collated in this example, however, the collated file could not be further ordered by a key in the master file even if this order may be useful. In the example, the policy and claims records could not be further ordered by an insured name because the claims records did not contain the name of the insured person.
The collated file would then be input to an accounting or tabulating machine, run, and the ordered cards would be used to prepare and print a report on the printer of the machine. The operation of the accounting machine, like the collator, was controlled by a removable plug board wired for a specific application. As the input cards were read, the accounting machine could format and print data from the cards. The accounting machine had electromechanical counters to which card fields could be added and subtracted. This capability was especially used for accumulating totals and subtotals. In this example, the accounting machine could print a policy number and the associated data on the top of a page, then list the claims against the policy number with a settlement payment or a claim reserve amount, print the total claims amounts after the list, and at the end of the run, print the total liability of all the claims for all the policies.
In 1959, the master/detail record processing functions of the accounting machine were made available on the IBM 1401 computer in a language referred to as the report program generator (RPG). The first users of the RPG were familiar with the methods of punched card processing in the unit record machines era. The RPG was designed to be similar to the master/detail processing those users already knew. On an IBM 1401 computer with three or more tape drives, it was possible to sort files with the computer. With the presorted input files, the RPG was capable of combining the collator and accounting machine functions of unit record master/detail processing. Enhanced versions of the original RPG are used in many computer installations today. However, the collating function and its associated processing, often referred to as the RPG cycle, is still present in the latest versions.
The RPG does not have a sort capability. The sort function is supplied by a separate general purpose sort program which uses the computer to order the input files. The combined functions of the collator and accounting machine are then performed by the computer using the RPG cycle. Since 1959, the devices supported by the RPG have been extended to include, for example, tape files, disk files in several formats, and other computer peripherals. The capabilities of the language have likewise been greatly increased and the language now supports many features not directly related to processing two files of differing record types.
The capabilities of the RPG are limited by the constraint that all its input files must be preordered by a common key. An insurance company may have a file of insured records where some insureds have multiple policies, for example, fire, commercial general liability, and auto. The RPG may be used for producing a report where information from the insured's record is listed followed by the insured's policies and immediately after each policy the claims against the policy. If the claim records do not contain the insured's identifier (ID) field, this report cannot be produced with a single RPG run. In order to produce the report, either the insured's ID must be coded with each claim record, or an intermediate claim file with the ID from the policy file added to each claim must be produced, and then a second run made with the intermediate claims file ordered by the policy ID within the insured ID. In the case where the insured's ID must be coded with each claim record, a field is added to the claim which is unnecessary because that field can be obtained from the policy record. In the case where an intermediate claim file with the ID from the policy file added to each claim must be produced, the complexity of the application and the effort needed to implement the application are substantially increased.
A master/detail application that ran on unit record equipment required its designer to have an in depth understanding of the operation of the collator and the accounting machine. Likewise, a master/detail application designed to run under the RPG requires a programmer to have a good understanding of the operation of the runtime collating and calculating functions of the RPG program cycle. Therefore, the RPG application designer is required to understand both what the application is to accomplish and also the RPG language runtime operation.
Partly for this reason, many programs which process interrelated files of multiple types have their collating functions designed and coded ad hoc using application specific logic. As with the RPG, the files are not preordered by the program but by a general purpose sort program. The common business oriented language (COBOL) has a sort verb and COBOL programs can pass records one at a time to the sort program and receive the sorted records one at a time from the sort program. However, the ordering of the records is performed by the general purpose sort program which interfaces with the user's COBOL program at the time of execution. This approach to a master/detail application offers a programmer the complete procedural capabilities of a high level programming language. However, when compared to the RPG, the implementation of the collating and ordered processing functions adds to the design and programming effort and makes program development significantly more time consuming and error prone. Because of the complexities that result when more than two files are involved, an application may be designed to run as a series of programs in which intermediate records are created and processed, even though this is not really necessary.
Another method for interrelating records of different types is through the “join” clause of the “select” statement of the structured query language (SQL), which operates on a relational database (RDB). In SQL/RDB nomenclature, a table corresponds to a file and a row in a table corresponds to a record in a file. Conceptually, a relational database table is stored in the form of a matrix. The “join” clause of SQL combines data from two tables and stores the result in a third table. The “where” clause of the “select” statement specifies the “join” condition. Following the above example, each row of the first table would contain the data from a claim and each row of the second table would contain the data from a policy. The “where” clause would specify that the tables were to be joined on an equal policy condition. The result table would contain a row for each claim. The fields in the row would comprise some or all of the fields from the corresponding policy and claim rows. This result table could be further joined to another table whose rows contain data about each insured. In this case, the “where” clause would specify an equal insured ID condition. The result table could be used to quantify the insurance company's experience with the insured taking into account the premiums and claims from all the policies the insured has with the company. It would be possible to obtain this same data using the RPG. With the RPG, a new file could be created during collation of the policy and claims files in which the records contained the same data as the result table from the first SQL “join” clause. The new file could be sorted on the insured ID and collated with a file of insured records to produce the result of the second “join” clause.
According to the design philosophy of SQL/RDB, the implementation of the “join” clause is not made known to the SQL user. The user cannot therefore predict what SQL will do at runtime to cause the computer to produce its result table. Since the implementation of the “join” clause is not available to the user, the user has no way to write an SQL “join” clause to optimize runtime performance. Moreover, SQL is a query language and not a batch processing language. A given implementation of SQL may be aimed at optimizing queries that produce small result tables rather than result tables that contain a large number of rows. For this and other reasons one would expect an SQL implementation of a large scale master/detail processing application to run slower than an RPG or COBOL implementation of the same application. Furthermore, SQL only operates on data in a relational database. SQL cannot process data from a sequential disk file, much less from tape or other sequential media, whereas RPG and COBOL can process all these physical file types.
RPG and SQL are capable of interrelating two files or tables by a common key. If an RPG or an SQL application requires the interrelation of three files, an implementer of the application must create a combined record file from two of the files and then interrelate the combined record file with the third file. This requires a second runtime execution in the case of RPG and another join operation in the case of SQL. If more than three files or tables are to be interrelated, each additional file requires another runtime RPG execution or another SQL join operation. The reason these extra steps are required is that a single join clause can only join two tables and that an RPG collation run can only combine two types of files. Because of this limitation of RPG and of SQL, an application implementer cannot define the file relationship he/she wants to establish, but is compelled to divide the processing into a series of two file operations.
Hence, there is a long felt but unresolved need for a configurable computer implemented method and system that efficiently interrelates multiple large source data files and provides ordered access to these interrelated source data files.