An important problem in software engineering is the maintenance of "legacy" programs, that is, programs which were originally written many years ago but are still being used, and consequently must be modified when bugs are discovered or new features are added. The legacy program problem is becoming ever more pressing. Software, unlike the hardware it runs on, does not wear out, and consequently can be used for as long as there are computers which can execute it. Further, in many modern devices such as telephone switches or data base systems, the software is the most expensive component of the system, and consequently will be completely rewritten only if there is no other practical alternative.
One aspect of the legacy program problem is understanding the program. While understanding software is never easy, understanding a legacy program becomes more difficult over time. The program's original developers have typically left or are doing other things, and whatever documentation originally existed generally does not adequately reflect the many modifications of the software. Indeed, understanding legacy programs is so difficult that it is estimated that more than half of the time spent maintaining a large legacy program is in fact spent learning to understand it.
Software engineers have developed a number of techniques for making programs more understandable. One common method is to construct a "specification" of that system--an abstract description of the system presented in different language, typically a language that is more concise and/or declarative than the language in which the system is implemented. Because no specifications exist for many software systems, techniques for automatic "specification recovery" have been frequently proposed in the software engineering community as an aid in maintaining software systems.
Techniques are known for recovering a specification by analyzing the source code of a software system. Examples of these techniques are described in T. J. Biggerstaff "Design Recovery for Maintenance and Reuse", IEEE Computer, July 1989; P. T. Breuer and K. Lano "Creating Specifications from Code: Reverse Engineering Techniques", Journal of Software Maintenance: Research and Practice, Vol. 3, pp. 145-162, 1991; and C. Rich and L. Wills "Recognizing a Program's Design: A Graph-Parsing Approach", IEEE Software, January 1990. One difficulty with these techniques is that they are limited to source code; in many legacy systems, the source code has been lost or bugs have been fixed by "patching" the executable code, so that there is no longer an exact correspondence between the source code and the executable code. Another difficulty is that source code is complex, and consequently, its analysis is not easy.
What is needed in dealing with legacy programs is a way of getting a specification for the program which simply avoids source code analysis and instead concentrates on what the legacy program actually does.