As software systems are developed and maintained there is a strong tendency to write new programs to accomplish tasks rather than re-use or modify software already existing. This makes the general complexity of software systems increase with concommitment increase in size, maintenance effort and in conceptual effort needed to understand and control future development.
These tendencies are controlled in the present art with diagrammatic methodologies for visualizing the system at a variety of conceptual levels, with design decisions that limit functionality and future expansion possibilities and with management discipline to ensure manual review of changes proposed.
Missing from the present art are tools and methods for examining existing source code to find opportunities for simplification, generalization, consolidation or any other automated or semi-automated support means for helping software designers and developers reconceptualize the code into a more coherent and manageable form. Source code therefore evolves in the direction of increasing size and complexity rather than increasing coherency and generalization. The overall costs of such trends are enormous.
Source code analysis also provides powerful tools for the system analyst working on maintaining or enhancing an existing system, as well as for quality assurance. With the proper tools the analyst can understand the implications of a change in one part of a system, in particular the change's effects on other parts. This can dramatically reduce the chances of a change to a system introducing defects in its overall functioning. Planning maintenance or enhancements becomes more reliable and accurate in the presence of an appropriate source code analysis tool.
Quality assurance personnel can use source code analysis for determining the overall quality of the code, rather than merely, as is present practice, testing the system with a set of cases "from the outside". Quality assurance activities could use an appropriate source code analysis system to check conformance with programming standards as well as find system defects "in the large", that is, checking consistency across the system rather than just within a module. In particular, special-purpose reports can be formulated that generate relevant quality assurance metrice (e.g. the average number of callers of all functions not labelled "utility subroutines"), likely defect-producing constructs (e.g. all symbols that have more than one definition), large-scale system organization (e.g. module cross-reference charts) and other reports specific to the needs of quality assurance.
Since experienced practitioners of the art recognize the benefits of code examination a variety of tools have been utilized, whether appropriate or not. The most commonly used tool is the text editor. For example, in seeking where a particular symbol is referenced programmers can use the "search" function found in all text editors to see a particular symbol of interest along with its source code context. Experienced practitioners recognize that this method is minimally useful.
More to the purpose at hand is a program named "grep" originally developed as part of the UNIX system. This program accepts a pattern (a string or a string with variable parts) and a list of file names. It searches the given files for lines containing the target string pattern and reports the file and line number of any lines containing the pattern. This program has the primary disadvantages of treating all source code as simple text therefore making searches that depend on the particular semantics involved impossible and of usually returning a too-large and useless result for short names that might occur within other strings. Further, patterns that eliminate possibilities of such useless results are complex or impossible to devise.
There are now several "cross-reference" systems available for various programming languages. These usually produce a listing of each symbol used and the location (source file name and line number) of the reference. These are somewhat more useful than the "grep" systems in that they have at least recognized symbols in the vocabulary of the particular programming language. However, their deficiencies include a static listing, no further knowledge of the semantics of the language, and sheer bulk of information produced. These deficiencies tend to make such systms generally unusable except for the smallest software system.
The first system designed for a similar purpose to the present invention was developed by Dr. Larry H. Masinter and described by him in "Global Program Analysis in an Interactive Environment", SSL-80-1, Jan. 1980, Xerox Palo Alto Research Center, Palo Alto, Calif. This system, named the Masterscope, was written in the Interlisp programming language and analyzed source code also in the Interlisp language. It also contained a special-purpose data store for the results of the analysis and a built-in query language (English-like) for users to query that data store. The data store was kept in main memory and consisted primarily of hash-coded access into lists of symbols mentioned in the analyzed code.
A research project at the University of Utah developed the Telescope system (see "Telescope: A Cross-Reference Utility for Lisp", Jed Krohnfeldt, December 1986, OpNote 86-11, Utah Portable Artificial Intelligence Support Systems Project, Computer Science Department, University of Utah, Salt Lake City, Utah 84112), one very similar to Masterscope. It also is coded in Lisp (albeit a different dialect), analyzes only Lisp source code contains its own special-purpose data store and its own special-purpose query language. It differs from the Masterscope primarily in the exploration of the use of a more complex, frame-and-object data store
One commercial product in the general field is also available. Digital Equipment Corporation (DEC) is selling its "Source Code Analyzer" (SCA) system as part of its integrated system development tools package that runs on the DEC VAX family of computers. See "Guide to VAX Language-Sensitive Editor and VAX Source Code Analyzer", Digital Equipment Corporation, Maynard, Mass., order no. AI-FY24B-TK, August 1987. This system consists of (i) options on some of DEC's language compilers that causes the compilers to produce a special file with additional analysis information, (ii) a program to gather this information from several (or many) such files into a consolidated file and (iii) a special. purpose retrieval system that is integrated with checking consistency across the system rather than DEC's "Language-Sensitive Editor". The retrieval system has a limited set of possible queries and no means for additional queries.
The last three systems described (Masterscope, Microscope and SCA) differ from the previously mentioned systems in that the last three systems have capabilities that show they are intended for the specific purpose of examining source code and internal relationships therein. They provide code-specific queries, for example a tree of calls from one function or procedure to the ones called, and so on. They also provide some consistency checks across source file boundaries that their respective compilers and/or languages do not otherwise enforce.
However, none of the above described tools solve the general problem of extraction and accessibility of program-semantics information for the general software development community. These tools are (1) ignorant of purpose and so cannot extract certain kinds of information (e.g. they treat source code as text and are therefore unable to derive the symbols or the types of the symbols); (2) designed for special-purpose languages (e.g. Lisp); (3) integrated with special-purpose data stores; and/or (4) integrated with non-extendable special-purpose query systems.