Static Analysis is most important in validation and verification of safety critical applications. However, in case of large scale applications, static analysis lacks in analysis of thousands of lines of code due to high demand of resources. Thus, scalability remains a bottleneck for static code analysis tools. US 20120291004 A1, (Shrawan Kumar et. al.) explains clustering as a technique for scaling static analysis tools to very large systems.
The software code implementing a certain functionality is termed as a ‘cluster’, and the cluster is denoted by a function that is entry to the functionality of the software code. The cluster being smaller in code size and less complex than the original software code, the static analysis tools can analyze the multiple clusters individually and separately to produce cluster-specific analysis results.
However while performing static analysis of a cluster code, due to the imprecise nature of static analysis and the conservative approach taken for the inter-cluster communication (data sharing), a large number of analysis warnings are generated. In such cases all the warnings need to be reviewed manually in order to determine if the warnings represent an actual defect. This process consumes a lot of time and effort. Further, a warning for a program point belonging to the multiple clusters gets reported multiple times, that is the warning is generated for each of associated clusters of the program point. Thus, there is a problem of increased number of warnings on clustered software code.
Currently major research in the static analysis is directed towards making the static analysis more precise. However lesser research work have been directed to reduce manual efforts spent during review of the warnings. In the current state of the art, the static analysis tools only report cluster-wise generated warnings in clustered-code analysis. Further, the common point warnings are reviewed cluster-wise, because reviewing common point warnings in context of multiple clusters at the same time requires switching between multiple clusters and switching between multiple clusters becomes tedious. Further information required while reviewing all common-point warnings together in the context of their associated clusters may become too large to manually analyze, and if any review assisting tools is used, the review assisting tool may not scale on the multiple clusters.
A majority of existing solutions relies on grouping of error reports generated during static analysis based on clustering machine-generated defect reports by leveraging syntactic and structural information available in the static bug reports.
However, prior art literature bases the clustering of error report generated during static analysis based on syntactic and structural information, and a group/cluster thus formed can include warnings/defects belonging to different program points, and generally fail to group the warnings belonging to multiple clusters.
Further, the static analysis tools mostly fail to analyze many real world systems as these systems are very large and complex (usually consisting of millions lines of code). Clustering, breaking a system into multiple clusters, is commonly used technique to overcome this issue.
In this technique, scalability of analysis tools is achieved by splitting the system code into smaller code clusters. Such a cluster being smaller and less complex than the original system is analyzable by static analysis tools to produce cluster-specific results. The cluster-wise analysis of system further leads to the increased number of warnings due to conservative analysis for the shared variables. Further, warning for a program point belonging to multiple clusters gets reported multiple times, that is for each of its associated clusters. Thus, there is a problem of large/increased number of warnings on clustered-code.
Currently major research in this static analysis area is directed towards making static analysis more precise, and not much has been done to reduce the manual efforts that are spent during the review of the generated warnings. Currently, there does not exist a technique/method/system that aims to reduce the efforts spent while reviewing the clustered-code warnings.
It is noted that prior art literature remarkably fails to disclose an efficient reviewing technique for warnings generated during static analysis of cluster code software, and therefore this is still considered as one of the biggest challenges of the technical domain.