The present invention relates to a source-program processing method for quantitatively evaluating functional redundancy that indicates how many code segments having similar functions have been incorporated in a source program and extracting redundant code segments from the source program.
Code segments having similar functionalities may be included in a design phase in software development.
Programmers could copy pre-incorporated code segments due to unawareness of similar functionalities distributed over the program or for cost saving even if they are aware of the distributed similar functionalities. This could result in many code segments of similar functionalities distributed over one program.
Modifications to the functionalities of some of the distributed similar (functionally-redundant) code segments may require alterations to all of the redundant code segments.
A large number of redundant code segments to be modified at the same time cause cost up for searching a program for the segments to be modified and high possibility of failures in searching. Search-failed code segments could cause malfunctions of the program, which force the programmers into further modifications to clear the malfunctions, thus resulting in further cost up.
Redundant code segments incorporated in software have big effect on the cost of software maintenance such as modifications. Bigger effect will be given to the maintenance cost as the set of redundant code segments becomes larger.
Several techniques to search programs for redundant code segments are known. One is disclosed in Japanese Patent Laid-Open Publication No. 8-241193 in which comparison indices (similarities) are used for evaluating how programs or two code segments are similar to each another. Another one is disclosed in Japanese Patent Laid-Open Publication No. 2001-125783 in which code segments are determined as functionally-redundant code segments when their similarities exceed a reference level.
The known techniques can search for redundant code segments in a program. Nevertheless, these techniques cannot offer quantitative evaluation to the redundant code segments. This leads to ineffective extraction of code segments to be modified from a program, which will be discussed below in detail.
The known technique using similarities in searching for redundant code segments cannot quantitatively evaluate functional redundancy when there are a large number of functionally-redundant code segments. Therefore, this technique cannot offer quantitative evaluation of maintenance cost which could be high due to the existence of the redundant code segments.
The other known technique using a reference level could fail in searching for functionally-redundant code segments. This is because search results largely depend on how the reference level has been set. Another cause is that redundant code segments will be neglected if their similarities are lower than the reference level even though they have similar functionalities.
Another known technique disclosed in Japanese Patent Laid-Open Publication No. 2000-92841, etc., is a statistical technique called hierarchical cluster analysis (HCA) in which components are classified into several groups according to their similarities.
The HCA technique is widely used in several fields, such as:                an analysis of questionnaires disclosed in Japanese Patent Laid-Open Publication No. 2001-184405, etc;        an analysis of status data from targets to be monitored, disclosed in Japanese Patent Laid-Open Publication No. 09-093665, etc; and        a classification of a large number of electronic documents linked over a network, disclosed in Japanese Patent Laid-Open Publication No. 10-027125, etc.        
The HCA technique is to hierarchically classify a large number of data into several groups to know the feature of each group. It has, however, not been used to evaluate the tendency of all groups with an index derived from the results of classification and determining the groups from which the index has been derived.
Generalization technique is performed to create a new (common) class of the same (or similar) functionality when several classes share the same functionality, in software design with particular object-oriented languages.
The generalization technique has advantages such as (1) low cost for modifications to the same functionality by making modifications only to the common class and (2) easy enhancement by making new derived classes of the same functionality. Original classes are defined as being derived from a common class. Hence, some class definitions have a hierarchical structure.
The generalization technique discussed above is performed to create a common class when several classes share the same or similar functionality and define original classes as derived classes for programs developed with object-oriented languages.
Other similar functionalities that have not been found during design are, however, sometimes found after installation of programs of object-oriented languages. This requires further generalization technique to the newly found similar functionalities, which results in increase in operations after installation.