1. Field of the Invention
The present invention relates to comparison of two computer files to determine differences between quantities in each file, particularly differences in dates, monetary currency conversion amounts, length measurement conversion amounts and similar quantities, and also relates to the so-called Year-2000 problem of determining whether a computer program is functionally operable both before and after Jan. 1, 2000.
2. Description of the Related Art
The predominant practice in computer programming has been to omit the leading digits from the year in a representation of a date. In the early 1960s, computer programs were typically written with one digit years in the date. Shortly before the turn of the decade, problems similar to the current year 2000 computer problems began. One digit years did not handle the turn of the decade, as two digit years do not automatically handle the turn of the century. Thus, by the end of the 1960s, dates were commonly represented by six digits; two digits representing the month, two digits representing the day, and two digits representing the year. This practice has minimized memory requirements for storing a date, and all dates manipulated by computer programs of that era could be expected to be within the 20th century. This practice was continued with each succeeding generation of computer to ensure compatibility between generations. Indeed, by 1970 a standard promulgated by the United States Department of Commerce required that federal agencies ensure their computer programs used a six-digit date representation to avoid a repeat of the problems caused by the one digit year and the turn of the decade to 1970. The practice of omitting the first two digits (century digits) from the year in a representation of a date thus became enshrined in mainframe computer programs, and spread from government computing to general business computing.
With the advent of the year 2000, it is becoming necessary for computers to manipulate and distinguish dates in both the 20th and 21st centuries. A common computing task requires computing the difference between two dates to determine, for example, the amount of interest due on a loan, a person's age or retirement benefits, or similar information. A computer program that computes the difference in years by simply subtracting one two-digit representation of the year from another, instead of arriving at a difference of, for example, one year, the computer may arrive at a difference of 99 years. As a result, a computer could, for example, issue an erroneous bill to a borrower for 99 years of interest on a loan. This problem has become known as the "Year 2000 Problem," sometimes abbreviated as "Y2K" problem. Date representations other than those that use two digits to represent the year, two digits to represent the month and two digits to represent the day are, of course, known, such as a representation that represents a month by its name rather than its corresponding number and the Lilian and neo-Julian representations that are actually quite common in mainframe computer software, all but the Lilian date representation suffer from the Year 2000 problem as well.
Programmers and other practitioners in the art have proposed various methods for solving or minimizing the impact of the Year 2000 problem and have focused on various aspects of the problem. Some have focused on rewriting the operating system of a computer to adjust the manner in which operating system represents the date. Others have focused on rewriting application programs to adjust the manner in which they represent dates. Most such remediation efforts involve either changing the date representations from a two-digit format to a four-digit format or the programs to react to the date as if it had a four-digit year. The latter is termed "four-digit date logic".
Another area upon which programmers and other practitioners in the art have focused attention involves determining whether application program remediation efforts have been successful. It is important to determine whether a program that performed certain functions or algorithms and represented dates using a two-digit format performs the same functions or algorithms in exactly the same way after it has been rewritten to represent dates using a four-digit date logic. In other words, it is important to ensure that remediation not only fixes the Year 2000 problem but preserves the functionality of the original program.
The concept of aging a file is central to many of the proposed methods for determining whether Year 2000 application program remediation efforts have been successful. The files at issue are data files containing dates that the computer program processes. In the most general sense, an application program of the type with which Year 2000 remediation efforts are concerned reads input data, including dates, from an input data file, processes the input data, and writes output data, including dates, to an output data or report file. Aging a file is a well-known concept, and software tools or programs have been developed to age files by a number of days specified by a user. A file aging program reads a data file, locates every date in the file, adds the specified number of days or years to each date, and copies the results to an output file. The output or aged data file is thus identical to the input or original data file but for the dates.
A two phase method can be used to determine whether an application program is Year 2000 compliant. The first phase tests whether the remediated program preserves the functionality of the original program for dates within the 20th century. The second phase tests whether the remediated program is Year 2000 compliant, i.e., whether it preserves functionality for dates in the 21st century.
In the first phase of the method, the user executes the original unremediated program by providing it with an input data file having dates within the 20th century. Execution of the program produces an output file. The user then executes the remediated program by providing it with the same input data file, possibly with dates reformatted with four-digit years. Execution of the program produces another output file. The user then compares the output file resulting from execution of the original program with the output file resulting from execution of the remediated program. If the two output files are identical (except for the expected differences in the date fields), the user can infer that the remediated program preserves the functionality of the original program for dates within the 20th century. The expected differences will be limited to dates which have been reformatted, such as changing two digit years to four digit years or changing the ordering of the year, month and day fields.
In the second phase of the method, the user executes the remediated program by providing it with an input data file having dates within the 20th century. Execution of the program produces an output data file. The user then sets the system date of the computer to a date in the 21st century. The user then ages the same input data file by a number of days necessary to set the aged dates to the new system date. With the new system date, the user executes the remediated program by providing it with the aged input data file. Execution of the program produces another output data file. The user then compares the output data files. If the files are identical, except for expected differences in the date fields, the user can infer that the remediated program preserves the functionality of the original program for dates within the 21st century for the range of program functions tested by the input data.
Comparison of the output files in the above-described methods may be performed manually by a user or automatically by a comparison tool or program. Automatic file comparison software tools are well-known, but most only identify mismatches between two files. At least one such tool is known that matches quantities that are equal but expressed in different formats in each file, such as the quantity "125" (a decimal number) and the quantity "1.25E02", which is the same quantity expressed in scientific notation. That same tool can compare any other relationship which can be expressed by a single instance of the relationship y=mx+b, where m is the slope of a line and b is its intercept, so would be capable of converting Centigrade to Fahrenheit, but does not compare dates. That same tool also allows specifying a range of deviation from the line expressed as an exact relative or exact absolute range. Exact relative ranges are pre-specified percentages of the answer to y=mx+b. Exact absolute ranges are pre-specified values above and below the answer to y=mx+b. Although file comparison software tools that identify the difference between two dates have been used in working on the Year 2000 problem, they require that the user identify the location of the dates within the files, the format of each individual dates and how to identify one record type from another in the same file.
Until the present invention, all prior file comparison utilities have required certainty. No tool has dealt with uncertainty of exactly how to interpret the data when there was more than one possible interpretation of the data. If there are no explicit delimiter characters around a field where a mismatch occurs, tools have been unable to compare them unless the user specifies exactly the bounds of the field. If the data could be of multiple formats with no deterministic characteristics within the data to identify the format (such as the "1.25E02" specifying scientific notation format), prior tools required the user to explicitly identify the single format and relationship by which all data is to be interpreted. Comparison of dates is a good example of data with multiple formats. Typical dates could be Julian, Gregorian or Lilian, just to name a few types, and the same character string could represent a date in several of those formats. Even if the dates are all of the same general format type, it may be impossible to offer a definitive interpretation of a date without more information. For example, given the string "01/02/03", the date could be Jan. 2, 1903, Jan. 2, 2003, Feb. 3, 1901, Feb. 3, 2001, Feb. 1, 1903, or Feb. 1, 2003 at a minimum. Prior date comparison tools could not handle such a date without specifying the explicit format of the date. Tools that compare dates further require the user to explicitly identify how to distinguish one record format from another. This is required in at least legacy computer system storage of data and for date comparison of printouts.
It would be desirable to provide an automatic file comparison software tool and Year 2000 compliance testing method that overcomes these problems and deficiencies. The present invention does so in the manner described below.