Software applications use dates in many operations, from complex financial transactions to the calculation of expiration dates of drivers' licenses and credit cards. Many such applications base their calculations on dates and subtract two-digit year values to arrive at a calculation result. For example, calculations of interest on a 5-year certificate of deposit involve the subtraction of the certificate's issue date from the current date and a determination of interest based upon the difference value. This calculation is not a problem if the certificate matures in 1999, but if it matures in 2001, the same computation can result in an error message or worse. In the year 2000, the two-digit year indication starts over at "00" and unless something distinguishes such date, the year will appear to be the year 1900--or so it will seem to many programs that use only the year's last two digits for dates.
In the early days of data processing, storage space was at a premium and it was decided to use two digits for the year indication. Most programs today carry forward that format and employ two bytes of 8-bit binary data to indicate the last two-decimal values of the year. Many of these programs/applications were written years ago and the authors/programmers who understood their organization and details are no longer available for consultation. Further, calculations employing year fields are often deeply embedded in very large program routines and are thus difficult to find and identify.
The key to identifying year fields in a program, at a reasonable cost, is to do so automatically and avoid the use of programmers to scan the code.
Year fields are normally composites of sub-fields. That is, the fields used to represent "year", "month" and "day" are sub-fields of a larger field of "date". To identify such fields, two primary matching techniques are currently being used. The first examines the labels assigned to various data fields used by the program. In COBOL, these labels are found in the DataDivision which is the area of the program which defines each of the data elements used in the program. Using various techniques, key phrases like "year", "yr", etc. are located in the labels. These data fields are then considered to be year-oriented fields. As a further check, the format of each such "year-oriented" field is determined and is scanned to see if it meets one of the commonly used formats for year information. The most common format in use involves three two-digit numbers that are defined consecutively.
The application of this dual test allows a search of a program to be carried out and often leads to the discovery of approximately 80% of the year fields. Because such a search procedure does not consider the interrelationship of a discovered year field with other year fields, both in the same program and in allied programs in the same application, data that would allow a more accurate year field determination is ignored.
Accordingly, it is an object of this invention to provide an improved method for identifying year-related fields in a program.
It is another object of this invention to provide an improved method for identifying year-related fields in a program, which employs data available in associated fields to assure a higher level of accuracy in the year field identification process.
It is another object of this invention to utilize classifications other than purely year oriented fields; for example classifications such as day, month, currency, durations and (when separated from years) centuries may also be used to evaluate applications. Weighted classifications may be assigned to improve the year finding process.