This invention relates to the analysis of computer software, including the source code and pre-coding documentation such as program designs, in order to estimate the cost of production, development and maintenance of the relevant computer programs. The analysis can also be directed to the tracking of progress in a particular software project, optimal assignment of labour to such tasks and, in some cases, to optimisation of the software design itself. The invention relates to a method and apparatus for analysing a computer program or a part thereof, and also to a computer program product including a computer readable medium having recorded thereon a computer program for performing such analysis.
Making accurate cost estimates for software development, enhancement, testing, etc is necessary for a vendor to remain profitable in the market. As noted by Fairly, R, in Software Engineering Concepts, McGraw-Hill Book Co, New York, 1985, page 64 and pages 72-75, estimating the cost of a software product is one of the most difficult and error-prone tasks in software engineering especially during the planning phase.
To ease the task, certain software cost estimation models have been provided, for example the COCOMO 2.0 Software Cost Estimation Model, disclosed by B Boehm et al in American Programmer, July 1996, pages 2-17. COCOMO 2.0 comprises a tailorable family of software sizing models, involving object points, function points and source lines of code; non-linear models for software reuse and reengineering; and an exponent-driver approach for modelling relative software diseconomies of scale. Even so, cost estimates (or effort estimates) remain difficult to make because some (or all) of the measures and parameters chosen to define them lack universal agreement as to what they mean and how they are to be measured. For example, two of the frequently used parametersxe2x80x94the number of lines of code and function points in a software productxe2x80x94are open to the personal interpretation of individuals, which can then lead to widely varying estimates being produced between any two individuals, or even the same individual making the estimates at different times. When shorn of jargon and high flying phrases, available estimation techniques are sometimes no better than educated guesses made by an individual.
The object of the invention is to at least partly remedy this situation by providing a measurement system comprising a consistent and repeatable measure of complexity in software codes. This is achieved by defining and providing measures for two of the central notions of the measurement systemxe2x80x94decision points and complexity index.
The present invention provides a method for determining the complexity of a computer program in which, from the program source code or other pre-coding documentation, such as, the program design, the actual or expected presence of certain predetermined items in the program is determined. These items are specifically those that are indicative of breaks in the ongoing forward flow of the program. These items are for example conditional statements, loops, memory allocations, subroutine calls, etc which break the linear flow of code and/or the linear flow of thought and which therefore need careful attention during coding. For example, the code portion that results in the break may need to be examined to see if the flow has been coded as intended, if return values of function calls have been handled correctly, whether memory leaks have been avoided, etc. Breaks in flow (of code and thought) are considered complex because there is a need to pause and reflect upon the change that is being made and the course of action that is being selected.
By contrast, the presence of a large but simple block of statements in a program is tedious but not complex since the block can be scanned sequentially from beginning to end without worrying about breaks in the logical (implicit or explicit) flow in the lines of code. Therefore, the number of lines of code (or how they are defined) ceases to be important. They add bulk, not complexity.
All the items in a program where a break in flow occurs, explicitly or implicitly, is called a decision point.
Decision points occur when we encounter, for example,
1. A conditional statement
2. Head and tail of a loop statement
3. Subroutine/function call
4. Memory allocation/reallocation/deallocation
5. Aliasing of allocated memory
6. Goto, continue, break statements in C/C++ or equivalent statements in other languages
7. Switch, case, default statements in C/C++ or equivalent statements in other languages
8. Return statement in C/C++ or equivalent statement in other languages
9. Return of pointers which are not in the parameter list of the function returning the pointer
10. Within a loop body, a variable, which is redefined or modified after its first use within an iteration
11. Implicit mixed type operations in an expression or statement
12. Use of built-in operators in their overloaded incarnation by user-defined datatypes
13. The division operator where the denominator is not a constant
14. Communication calls
15. File operations
16. Nested operations
The above-lists the most commonly encountered decision points. Others may be added as required by the situation in hand. The total number of decision points tells us the number of places in the code where we must look carefully. These are the potential stress points in a program.
Having identified the decision points, there is assigned to each an integer value indicative of what will be called herein a complexity index (CI) for that point. These values can be summed to provide an overall complexity index for a whole program or part of it. Alternatively or in addition, the distribution of decision points through the program can be analysed, e.g. by forming a complexity index histogram, so as to identify high complexity clusters.