In recent years, along with the technology of “repackaging” and “obfuscated code” are widely applied to malicious mobile applications, a lot of research work against malicious code have been carried around the analysis on internal structure of application program. Many detection methods that based on internal structure feature of the program can extract or build different graph structures by analyzing the decompilation code of target program; then judge the maliciousness of the program by comparing the difference degree between target sample and malicious sample graph structure. The research results show that this kid of detection method has relatively good effect against the technology of “repackaging” and “obfuscated code”.
Comparing to the traditional signature-based detection methods, this kind of method mainly solves two main problems as below. Firstly, the detection method on the base of signature has bad timeliness. It is difficult to find out unknown virus. Most signature-based detection methods require the help of manual analysis, and extract the byte sequence or specific string with virus information as feature and store into the feature database. Then, it judges the maliciousness of the program by the feature matching with the program code under detection. However, most of structure feature based methods are able to realize automatic analysis and detection of the sample. Generally, this method is enlightening. It is better to identify unknown virus. Secondly, the signature-based detection methods have low efficiency under a large number of variant samples from “repackaging”. At the same time, “obfuscated code” technology would increase the difficulty on the analysis of target program. Whereas, the detection methods of structure feature use the feature that the internal structure of most of the new variant samples is similar to existing virus. It can compare the similarity of the two and identify the virus quickly. The internal structure analysis of the program could also resist partial obfuscated code technology.
However, during the actual application, many detection methods on the base of structure feature have shortcomings on execution time, creation and analysis of graph structure. The main reasons are as below: firstly, generally speaking, the internal call structure of the application program is complicated. In order to compare the similarity of the graph, normally, it needs to store a large number of graph structure with malicious behavior as feature database. On this basis, there is huge calculation load for calculating the similarity by matching graph or subgraph. Secondly, on the aspect of building graph structure, many methods are inclined to endow partial semantic information to graph structure, and form various kinds of graph structure, such as control flow graph, data dependence graph, and permission event graph, etc. However, these methods require accurate matching to a given standard; on the contrary, it is adverse to against a large number of varietal viruses. Meanwhile, create new graph by using known sample feature is also not good for detecting unknown virus.
In the structure analysis of internal function call graph of mobile application, it is discovered that the function call graph structure in mobile application program is different from random network. Function call graph has the feature of complicated network for some part, such as scale-free. Therefore, the function call graph can be divided into many community structures by utilizing community generation method. In these community structures, there are close contact among nodes, while sparse contact with the nodes in other community. However, the community which divided on the base of pure structure feature is not always the best reflection of behavior feature from the application program. Therefore, it needs to combine the other features of mobile application program during the process of community division in order to judge division result together.