Software designs, much like abstract analogs (such as maps and blueprints), are built because they are useful for explaining, navigating, and understanding the richer underlying realities. With software, however, it is rare for even the most general design of an implemented system to be either complete or accurate. In many projects, senior programmers brainstorm on a white board, produce the program and produce just enough of a retrospective design to satisfy management. In projects with formal analysis and design stages, the design may be accurate when it is first made, but it seldom matches the final implementation. As code is developed it diverges from the design. These changes are rarely transferred back to the design documents because programmers seldom take the trouble to find and edit the design documents.
The lack of accurate design adds dramatically to the life cycle cost of software systems. Mismatches between design and code slow initial development of large systems because teams working on one portion of the system rely in part upon the design descriptions of other portions of a system. Inaccurate design has an even more dramatic effect on maintenance because maintenance done without understanding the underlying design is time consuming and prone to error.
Design and code can neither be completely separated from each other nor completely joined with one another. They overlap in that both describe the same system but are different because the intended audience of those descriptions are quite different. Design communicates the intent of the designers to other humans, while code communicates design intent to the machine. Humans share a vast common knowledge and can deal with abstractions but are weak at handling masses of detail. The machine is not hampered by details but is oblivious to abstraction and generality.
One prior art approach to synchronizing code and design supposes that if programmers are unable or unwilling to keep the code synchronized with design, perhaps programmers can be dispensed with and simply generate the code from the design. In some cases, such as when an application merely maintains a database, this approach works. However, for general programming this approach fails for several reasons. One of these reasons is that analysts and designers seldom, if ever anticipate all the details encountered in the actual coding. Programmers need to make changes that extend or "violate" the design because they discover relationships or cases not foreseen by the designers. Removing the programmers from the process does not impart previously unavailable omniscience to the designers. Additionally, most real world applications contain behavior that is best described with algorithmic expressions. Programming code constructs have evolved to effectively and efficiently express such algorithms. Calling a detailed description of algorithmic behavior "design" simply because it is expressed in a formalism that isn't recognizable as "code" does not eliminate the complexity of the algorithmic behavior.
Another previously known method is the automated extraction of object structure from code. Some tools are available that can create more or less detailed object structure diagrams directly from C++ class definitions that contain inheritance and attribute type information. Some Smalltalk systems also provide attribute "type" information that allows these tools to be similarly effective. Without the attribute information, tools can only extract the inheritance structure. This method does not actually parse and model code other than C++ header files or Smalltalk class definitions. Therefore, this approach can at best identify "has-a" and "is-a" relationships. These relationships may imply collaboration but this approach does not specifically identify any of the transient collaborations that are important for understanding design. In addition, it does not provide any information about algorithms.
Another method is the automated deduction of design by analyzing code execution. Collaborations implicit in Smalltalk code are difficult to deduce statically from the code and may not be fully determined until run time. However, Smalltalk is strongly typed at runtime so it may be determined exactly what kind of objects are participating in all collaborations by examining the receiver and the arguments involved in all message sends. The resulting information can be used to specify the collaborations observed during the execution. This method suffers from the following problems: (1) it requires test cases to exercise the code; each of these test cases must construct an initial state which is sometimes elaborate; (2) the test cases themselves require careful construction and may become obsolete as the system changes; (3) the effort needed to construct and maintain these test cases can be a deterrent to routine use of this technique; and (4) full coverage by the test cases is difficult to obtain and the degree of coverage is difficult to assess. This undermines confidence in the resulting design. Without full coverage, the extracted collaboration design is likely to be incomplete in important ways. For instance, the way a system is designed to handle the exceptional cases can be more telling than the way it handles the common ones.
A further method previously known is a static analysis of the parse tree that represents the code. Nodes in the parse tree share the ability to answer their "value" in terms of the type of the object they would represent at runtime given the type of objects they depend upon. The parse tree is then traversed in a depth first order to obtain the value of all the nodes. In this process, message expressions validate that their arguments are appropriate, variables validate assignments, and finally the value of the whole method (which is its return value) is validated against the design statement of what the method should return. As is the case when reading code, literal blocks complicate matters.
Code with literal blocks pose problems for static analysis in that these blocks are not necessarily invoked where they appear in the parse tree. Also, a literal block may be invoked more than once. Static analysis must not analyze a block when it first appears, and it must pass through that block each time it is invoked. No one-pass traversal of the parse tree can analyze this code. The parse tree approach to static analysis can be elaborated to handle multiple invocations of the same literal block and perhaps even recursive invocations of literal blocks. But the farther the analysis departs from a simple traversal of a parse tree, the more complex the system becomes.
Another practical problem with basing static analysis on a parse tree is that the parse trees used by the Smalltalk system compilers are optimized for byte code generation. The parse tree inheritance hierarchy may prove to be awkward for hosting a static analysis system. Also, many Smalltalk systems hide the code for parsing and compiling although that is more of an inconvenience than a real barrier.
Lastly, future systems may support the intermixing of multiple languages, e.g., Smalltalk, Java and Visual Basic, using the same bytecode set. Static analysis based on parse trees would have to be done separately for each language and is, therefore, potentially too complex and too difficult for practical use.
Thus, there is a need for a method and system which allows for a simple and efficient synchronization of code with design.