The software development process typically begins with a statement of the intended functioning of a software module, i.e., a statement of the problem. From this statement, a high level analysis is performed to determine the basic steps necessary to carry out the intended functionality. These high level steps are then further analyzed and translated into a computer language program. In many instances, the computer program interacts with the computer operating system, hardware, and/or other programs operating in the computer system.
Often, events occur after a final program is debugged, compiled, and linked, which alter the operation of the software, or make latent defects in the operational logic or code apparent. One such instance is the so-called "Year 2000 Problem", or Y2K problem. This issue is insidious, because existing and operational code with no present deficiency, must be analyzed and appropriately replaced or remediated before certain critical dates, or be at risk of failure. Since many computer systems are interrelated and depend on each other for operation, the failure of even small or remote systems may lead to a "domino effect", resulting in failure of related and more important systems. Some failure modes are relatively minor, with inconvenient consequences, while others are catastrophic, stopping operation of the computer system, or worse, causing a failure of a real system controlled by the computer system. Where the computer system is an embedded controler or otherwise mission critical, the software or firmware errors may lead to death or destruction. For example, utility, elevator, flight control and even medical equipment and systems often have embedded controlers. Even where the date information is ancillary to the main function, a date reference which reveals a program logical error may lead to failure of an entire system. Where this is most apparent is where a system logs events or performs trend analysis. If there is an inconsistency in dealing with date data, the result could be a shutdown or erroneous operation. In fact, legacy embedded systems may particularly present this problem, because in the past, program memory was at a premium and therefore conservation of resources by compressing or truncating date information was employed, even where this meant a critical limitation on system design life.
As discussed in detail below, many stable computer systems, particularly mainframe computer systems running immense and complex code, will suffer from the Y2K problem. This is a result of the use and reuse of legacy code and the persistence of efficiency techniques and assumptions which will no longer be valid early in the third millenium. When confronted with this problem, two significant considerations are the availability of accurate source code for the software to be analyzed and corrected, and the testing and debugging of replacement software to ensure that the functionality is correct, the corrected software remains compatible with systems to which it interfaces, and no new errors are introduced. Another consideration is the time and resources required to perform remediation, even where the source code and testing environment are available.
There are a number of other, similar types of problems to the Y2K problem. Essentially, the class of problems arise because existing debugged compiled program code, for which the source code may be unavailable or inconvenient, or merely voluminous, becomes defective due to a relatively rare occurrence of a definable event. Other examples include program or operating system updates or partial updates, desired substitution of elements which would not effect fundamental program flow or logic, and translation of parameter sets to ensure operation in a new environment. Particularly, a problem often occurs in Microsoft Windowstype operating systems, e.g., Windows 3.X. Windows 95, Windows 98, Windows NT, Windows CE, and other variants, in which programs typically reference dynamic link libraries or DLLs, visual basic objects or VBX, or other code, which effectively becomes integrated with the operating system, and is often stored in a predetermined path with such DLLs and VBXs from other programs. In this case, where such common code is referenced by a single name, it is possible, and even likely that an updated version of the VBX or DLL by the same name will not operate with older software, and newer software will not operate with an older DLL or VBX. Similar problems occur in other circumstances and under other operating systems, mandating synchronized updates of multiple system software components.
Another instance of this problem is the potential rise of the Dow Jones index above 10,000, which may lead to an extra digit required for representing the value in existing software which is otherwise fully functional. A further instance is the change in currency units in Europe to the ECU.
Precisely defined, the Year 2000 (Y2K) Problem is the insufficient representation of the year datum as a 2 digit field (or variable) in software applications and their associated data files and data bases, the incorrect calculation of the leap year status, the inappropriate assumption of "1900" or some other year as a base year or "1999" as the final year, and the inaccurate programming of date calculations with respect to these inaccuracies, including computations, comparisons, assignments and other operations. The year 2000 is a leap year, unlike 1900. Normally century boundaries are not leap years; there are several exceptions to this rule, one of them is if the year is divisible by 400 in which case it is a leap year. Identification of date data and date calculations is complicated by the use of pointers and arrays, obfuscated by the lack of standard naming conventions and masked by embedding in other data fields. As a result, the software in affected applications may incorrectly assume that the maximum legal value of the year field is "99" or may incorrectly perform sorts and other operations that involve years designated by "00". Negative time duration could result from subtractions from "00" (assumed to be year 2000). Incorrect leap year calculations will incorrectly assume that February 29.sup.th does not exist in the Year 2000. Thus, in the year 2000, (or with some applications even earlier), when the year field is incremented, many date dependent computer algorithms that assume the year field will increase monotonically, and will produce erroneous results or cause abnormal program termination, with possible disastrous consequences. Possible deleterious consequences for affected applications range from outright application failure, to the production of incorrect results or aberrant behaviors, such as the generation of faulty schedules, incorrect trajectories, or flight paths, the failure of time sensitive safeguards in embedded systems, the generation of incorrect paychecks, contract transactions, mortality or morbidity from failure of medical equipment or pacemakers, or payment calculations in commercial applications. Since virtually every major application deals with dates and there is widespread encoding of the year as a two digit field, the likelihood that an application is affected by the Y2K 2000 date problem is very high. Indeed it is foolish, and in many cases life threatening, to assume that any mission-critical application is not potentially affected by the Y2K Problem. Another common oversight is the assumption that newer applications are not affected. Since programmers hardly ever start from scratch, newer applications are frequently contaminated through their use of date and duration. Newer systems often have to use the data structures of older systems, since they provide a new function to be preformed on an already existing system and associated data. Further, programming tools and operating systems may be flawed, resulting in failures even where the source code program itself does not contain explicit non-Y2K compliant logic.
As the Year 2000 approaches, most organizations across the country have been wrestling with the problem of reprogramming date-dependent systems. Date-dependency refers to how most programs depend on the manner in which dates are represented in order to run computations. Many legacy software systems provide insufficient representation of date information to avoid ambiguity, and in particular, this problem arises either due to limited indexing address space or to the common abbreviation of a year as a pair of digits. Thus, as 1999 ends, a new date with a two digit year representation will be ambiguous between 1900 and 2000. For example, one common date format in COBOL programs represents Feb. 26, 1990 as 900226, and Jan. 1, 1991 as 910101, allowing the computer to compare the two numbers and correctly assume that the smaller number represents the earlier date. On Jan. 1, 2000, or 000101, however, those comparisons will be invalid.
Likewise, programs which calculate the day of the week using only the last two digits of the year will get wrong answers for Jan. 1, 2000, and all subsequent dates. This is because the formulas they use implicitly assume that the dates are in the 1900s. Jan. 1, 1900, was a Monday, but Jan. 1, 2000, will be a Saturday.
Another problem arises in systems which use a date as part of the key of an indexed file. This becomes a problem if the date has a two-digit year and the application depends on records in the file being in chronological order. Even if processing of the data does not depend on the records being in chronological order, it could result in records being listed in the wrong order in reports or on-screen displays. In 2000 and later, an application that is supposed to show the most recent items at the top, or on the first screen, would show the items on the bottom or on the last screen.
The digit pairs "00" or "99" may be handled by special software routines, in which they may refer, for example, not to a date but rather to a null value or have other significance. Sorting on date is a special case, as discussed below. Reports and screens should therefore be looked at on a case by case basis to ensure readability. There could be bugs there as well, such as hard-coding "19XX" or zero-suppressing the year.
There are three common approaches to remediate the software, which are subject to these and other date-ambiguity issues. The first is a complete replacement of two digit date codes with four digit date codes, (YYYY) or year and century codes (CCYY) with all accompanying changes necessary in the source codes of the program as well as necessary changes in the data files. The second approach is a logical analysis of a date representation to determine a most probable interpretation, allowing continued representation of dates as codes which occupy the same data space as two digits. Typically, the analysis provides a sliding window or pivot date, in which a continuous 100 year date range is supported, which does not necessarily coincide with the century break. Note also that a date window can, when a minimum value can be applied for the calculation, handle a range greater than 100 years, e.g., if you have no maximum retirement age, but have a minimum of, say, 16, then if in 1995 you encounter a birth date of 90 you could infer an age of 105 years, not 5. The third technique is compression of date data, in which a larger date range is stored in the same number of bits as the original date code. For example, by allowing an available 4 bits often used to represent the upper digit from 0-9, instead represent the upper hex digit from "0" hex to "f" hex', the years 1900-2059 can be represented. Further by using a binary representation for 8 bits, a 255 year span can be represented; if 14 bits corresponding to 2 ASCII digits are available then a year span from 0-16,000 can be represented.
For example, a standard date routine may be provided using a sliding date window to infer the century in performing calculations on 2-digit years. The 00-99 range is divided into a 25-year forward portion (projected dates), and a 75-year backward portion (current year and 74 past years). The routine, for example, calculates a "forward century" (add 25 to current 4-digit year, take two high order digits), a "forward century endpoint" (same calculation, low order digits), and a "backward century" (subtract 75 from the current century).
The most obvious solution to many Y2K problems involves increasing the data format of date fields from 2 to 4 digits in every affected application system. However, this is astronomically expensive, and it is may be unnecessary. The conversion to four digit year representations requires both changes to data and programs by converting all references and/or uses of 2-digit-year format (YY) to a 4-digit year format (YYYY or CCYY). It also requires converting all software pro grams to use the new date format and the use of "bridging" mechanisms to perform conversions between old and new data and programs. While this solution is preferable, ensuring that applications will operate correctly for the next 8,000 years, it has some notable drawbacks. The requirement to convert data formats requires every program that references date data to be modified and every data base that contains date data to be modified and bridged. Positional references to adjacent data fields may have to be adjusted. All record formats of records containing date data have to be changed. All data files, including historic data files have to be reformatted and rewritten. Performance may be impacted by increased processing times for bridge programs. Hard disk storage space requirements may double during data base conversion, for duplicated data files. Coordination is required with system owners of all external systems affected by changes to interfaces or shared databases to achieve simultaneous switch over to the updated date data format between multiple communicating systems. The date field format change requires all affected program logic, including declarations, moves, calculations and comparisons to be examined for year 2 to 4 digit expansion side-effects.
The sliding window technique requires changes to programs only; no data format changes are required. The data itself however, needs to be modified. The sliding window technique uses an advancing 100-year or 10-year interval. The century or decade of a given year are unambiguously determined by comparing the value in a 1-digit or 2-digit year field against an "application window" that has a fixed upper and lower year boundary that can be periodically adjusted. The size of the "window" for an application depends upon whether the application works with a 100 years or 10 years worth of data. The period of adjustment of the window depends upon a number of factors, including the encoding technique. Some techniques require adjusting the window boundaries every year, or at less frequent intervals of 5, 10, 30, 50 or a 100 years. The "Sliding Window" technique allows the span of years which an application processes to be indefinitely extended by periodically changing the window boundaries and notifying users that the window is about to advance. Adaptation of existing applications to use the sliding window technique requires some extra overhead and code logic around date sorts, collations, literal comparisons and computations to correctly perform the mapping of a 2-digit date into the application "window" and to assure that computations are correctly performed. However, it avoids most of the massive change and inter-organizational coordination associated with the 2-digit (YYMMDD) to 4-digit (YYYYMMDD or CCYYMMDD) date format conversion approach. By using a sliding window technique, many existing applications can be adapted to process dates using a 100 year sliding window, and will have a correct date interpretation for at least another 65 years without requiring any modifications to existing data bases.
Unfortunately, it may be quite difficult to create a realistic test environment to assess Y2K effects. It may not be easy to simply roll the clock forwards to see what happens. Setting a computer's internal clock forward may not just cause application failure on account of the date problem. For example, many software packages are licensed with time stamps that limit how long they can be used. Licensed software may not be licensed to run after the year 2000. In addition, it may not be possible to roll the clock backwards after rolling it forward because of irreversible changes made by applications to data files in future time that retroactively contaminate the usage of the application in present time. To isolate on-going operations from such contamination a completely isolated "time machine" is desired to evaluate and test Y2K year roll-over consequences.
Even with sophisticated software tools, fixing the Year 2000 problem is an extraordinarily expensive and complex undertaking. The domain modeling task to identify date fields for even a single moderately sized application requires scanning virtually every line of application code, and examining every data declaration or usage to determine if it is date related. Often the usage of date names and computations are obscured or ambiguous even in source code because of the use of acronyms that mask the meaning of application data. Furthermore, determining the meaning of a specific date calculation or a datum occurring in a program is an example of a problem that is known in the software reverse engineering research community as the "program understanding problem." Program understanding is considered to be an extremely difficult problem for which no fully automated solution is known to exist, because it requires the correct assignment to a set of program constructs, that can take a large variety of different algorithmic forms, of a real world interpretation that is known only to humans.
Another issue which arises in remediating programs relates to ownership and copyright. Often, programs are licensed by a user, who typically may have less than full ownership of the software, and who has no license to change the software or create a derivative work. In fact, the user may not even have the source code or have any right to use it. Therefore, techniques which require access to and changes to the source code to implement a correction pose certain difficulties. It is noted that it is unclear whether copyright law itself would prohibit self-help, but certainly evaluation of such considerations might impede a project. As noted above, software owners may include physical impediments to software remediation or software locks which prevent changes to the system.