Software evolution refers to changes made to a software and maintenance of the software. Changes to the software occur as organizational requirements change or a repair is required to fix errors or to improve performance and reliability. Further, software maintenance is required when changes are introduced in the software to adapt to a new environment, without any major change in the basic architecture. Over the life of modern software systems, the changes may involve simple changes to correct coding errors, more extensive change to correct design errors. Moreover, software bug fixes, enhancements, performance improvements, changing business requirement, design changes etc. require software system to have strong evolution capability. Sometimes, the evolution affects unintended part of the software, leading to maintaining the software system repeatedly. Code smells are anomalies often generated or injected in design, implementation or maintenance phase of a software development life cycle. As software evolves, changes in software lead to further maintenance activity. Study and analysis of how the software has evolved with respect to the baseline version of the software is crucial to understand.
A code smell is a surface indication that usually corresponds to a deeper problem in the software system, which is quick to spot or sniffable. A typical example of code smell is a long method. However, code smells may not always indicate a problem. For example, some long methods are just fine. Thus, it is necessary to look deeper to examine if there exists an underlying problem or root cause giving rise to the detected code smell. The code smells are not inherently bad on their own, they are often an indicator of a problem rather than the problem themselves. Thus, identifying the root causes of the code smell to the underlying system is essential to understand how the code smell is injected or removed.
During software maintenance, the incorrect understanding of the requirements, lack of experience in the software development process is expected to produce a code smell. An existing technique for software maintenance scans for software defect by employing code smell detection, clone detection and coupling detection technique. This technique utilizes probabilistic measures for software defect prediction, wherein the result is presented to the developer who takes the necessary step to fix such defects. The use of coupling based defect detection techniques might make the defect measure process complex if the dataset has higher coupling concentration in a smaller module of an application.
Some existing approaches rely on Machine Learning (ML) algorithms in order to help software developer to find cost oriented possible changes to source code. However, ML based techniques are computationally expensive due to complex data models. Further, the ML based techniques are time consuming and often accuracy can be a concern. Additionally, ML based code smell detection technique also demands some manual annotation, in order to provide a training set for the machine learning techniques. Aforementioned techniques essentially focus on software defects, not on code smells. Some existing ML based code techniques focus on only one type of code smell detection, such as attachment feature and do not provide a generalized approach capturing all types of code smells. An existing method focusses on detecting preventive maintenance in software source code. Preventive maintenance is the modification of a software product after delivery to detect and correct potential faults in the software product before they become effective faults. This existing method comprises analyzing of source code for two versions of software, defining data sets associated with the characteristic of source code. Further, the data sets are analyzed to find the occurrence of preventive maintenance performed on the source code. However, the existing approach above is not based on program dependence graph, wherein program dependence graph are oblivious to semantics preserving statement re-orderings and hence are well suited to detect semantic (functionally equivalent) clones. Moreover, the standard PDG-based clone detection tools are able to detect only certain clones but have limitation to detect all clones that may be present. Standard PDGs simply approximate semantic dependencies between statements.