One of the major problems in software development and maintenance is that of keeping track of the structure of the code so that changes may be made and, more generally, that the operation of the software may be understood.
Very often, too few developers know a particular application's source code and it's vagaries sufficiently well to be able to make changes quickly. The code itself is either poorly structured and/or documented, usually as a result of being rushed into production, or because the code has been patched or layered too often without sufficient consideration. Even when code is properly structured and documented it can be equally difficult for someone to appreciate this fact without weeks of research. Before ever a line of code is touched, man weeks and months of effort go into understanding the application's innards. Difficult questions about which classes are grouped together as a component, which classes inherit from one another, which pieces of code pass parameters to one another, etc., all need to be answered before code is altered. The reason for this is that making a change in one part of the application can bring on changes in multiple other locations, and quite frequently unit testing of the code that has been changed may be insufficient, and more extensive regression or even system testing may be needed. This costs a lot of money, assuming that resources are available in the first instance. It also assumes that the human resources will continue in position for a sufficient period of time to understand the application so as to be able to make changes quickly. Programmers usually use text based editors to view code and make changes. Their managers normally understand the scope of developer's tasks by asking for diagrams, and discussing program source code attributes such as the number of lines of code, function points, number of classes etc. This normally takes place in meeting rooms using flip-charts which are pinned on the wall and used as references which of course change as time progresses. This is a time-consuming and inaccurate method for communication common and sharable information.
Traditional engineering practices that work well for hardware engineering have difficulty in dealing with software. Such practices involve a “top-down” (or more generally, spiral) process that produces design artifacts before manufacturing starts.
The closest analogy to “instructions for manufacturing” that applies to software are formal specifications. Formal specifications precisely define what software components should do without specifying how they are implemented. In principle this gives both programmers and testers descriptions from which to work independently.
Specification-driven development and testing is useful for applications in aerospace or the military with stable requirements, high reliability demands and few budget restrictions. This approach is not cost-effective for the majority of software projects.
The generally accepted “best practice” for mainstream software development is the spiral or iterative model used with the Unified Modelling Language (UML). This approach builds the product by incrementally analysing, designing, implementing and testing a set of use-cases. Each iteration results in an updated UML model and the corresponding executable software and test cases.
Many developers that have used the iterative/UML approach would agree that it adds a degree of rigour to the process, and that the UML diagrams serve as a useful roadmap and concise shorthand for the underlying source code.
However, the cost and effort required to introduce UML may not always be justified by the benefit. It is an invasive method that requires a lot of training and change of work practices, which are not easily accomodated in gradual fashion. Models must be generated for all work in progress. The software industry is very time-sensitive, and schedule pressures mean that the delay today for a potential future gain cannot be tolerated.
The usefulness of UML models is also limited because the notion that there is a single design (albeit one that evolves) that serves all purposes throughout the life of the development is flawed. In reality each member of the development team, be they architect, designer, implementer, integrator, manager, teamleader, tester, configuration manager, project manager, product manager, etc., each require their own view of the software “design”. The required view continually changes as the individual's activity changes. All these views must be accurate and consistent with some common underlying reality. Current UML tool offerings do not support such a dynamic, interactive, multi-view based usage, and this partly explains why there is not universal enthusiasm for UML modelling.
In an attempt to reduce the risks and costs while keeping the benefits, many organizations use UML as a reverse-engineering technology. This way, fewer staff need to be trained in the use of the method and tools. The diagrams are reverse engineered from the source code, and design documentation is produced after the product is implemented.
In principle this is a reasonable approach. In practice, most UML tools are really designed for forward engineering. While they do provide reverse (or “round-trip”) engineering functionality, it is assumed that any changes from the model are relatively few and that the user can reasonably update the model organisation and layout manually.
This is certainly not the case for a large pre-existing or in-progress software project for which no model has ever been generated. Organising a reverse-engineered model using a forward-engineering tool is extremely laborious and is unlikely to either reflect the design intended by the developers, nor to expose an optimal inferred design.
Specialized reverse engineering tools offer the ability to parse existing source code and to provide the programmer with detailed information not readily available from the the source code. This is typically cross-reference or similar information. Although such tools sometimes claim to aid software comprehension, the information they provide is too detailed to help with “design-level” comprehension and is more suited to programming, debugging and maintenance activities.
There is clearly a rift between the available design technology and the needs of the development community. On offer is a choice between expensive, invasive, relatively static, forward-biased design tools and low-level, implementation-biased reverse-engineering tools. There is little or no tool support that addresses the status quo of mainstream software development.
The state of the art is schematically illustrated in FIG. 5, which represents a histogram of the proportion of software developers versus the relative degree of forward- or reverse-engineering used to develop software. As described above, formal specification is a rigorous forward-engineering practice used by relatively few developers. Round-trip UML design is a somewhat less rigorous forward-engineering practice used by a substantial population of programmers, but not by the majority. Source analysis 103 is a reverse-engineering practice only sometimes used in software development. The majority of developers representing most of the area under the histogram are under-served by existing technologies.
There is thus a need for a technology and notation that can be applied to existing or in-progress software development projects without the need for extensive training or change of work practices and minimal negative impact on on-going work. There is also a need for such a technology to support both large-scale reverse-engineering as well as efficient inference of relevant essential design information. A system that can provide highly interactive and dynamic views is needed, to enable individuals to expose and focus on information that pertains to the task in which they are engaged. Such a system should not simply construct static designs, but rather allow users to actively engage with the software, simultaneously exposing specific information and increasing overall comprehension.