In the malware research and defense, it is extremely important to test whether a suspicious program resembles to any known malware. Various techniques have been proposed and developed. These techniques can be classified into static analysis or dynamic analysis depending on the attributes used to measure the similarity between the suspicious programs and the known malware.
In static analysis, the subjects under test are examined and the features are extracted without actual execution. Bilar et al. proposed an approach to distinguish malware and benign programs with statistical analysis of the op-code distribution. Tian et al. measure the code length of each function in the program and use the frequency of their occurrence within a particular sample of malware as the feature for malware classification.
Another proposed automated malware classification system is based on the function length and the printable strings extracted in an executable file. Sathyanarayan et al. proposed generating signatures for malware families with their imported API by scanning the executable files to extract the frequency of invocations of critical APIs in order to evaluate the likelihood of maliciousness. More advanced static analysis utilizes a function call graph to perform malware classification.
Because no execution is needed, the advantage of static analysis is its efficiency but it is ineffective against advance malware which is polymorphic or metamorphic because the static analysis approach lacks runtime information. Krugel et al. and Zhang at al proposed methods to identify the presence of polymorphic or metamorphic malware but its presence cannot be correctly identified if the executable images are packed or encrypted.
On the other hand, dynamic analysis supervises the execution of a program and extracts program features during runtime. Various kinds of dynamic-analysis-based malware recognition techniques have been proposed. Dai et al. proposed a method of executing a program inside a single-step running mode debugger to collect instruction traces of the program. The instruction trace is then decomposed into basic blocks of abstract op-codes and processed using data mining techniques to discover instruction patterns that are in common with a malware sample. Recognizing malware with instruction patterns can still fail to recognize metamorphic malware. This is because advanced metamorphic malware replaces instructions and reorders memory access to disable instruction-pattern-based recognition methods.
Automatic behavior analysis of malware is an important measure in the analysis process to efficiently develop detection methods and solutions for modern, rapidly growing malware. Through behavior analysis, malware instances with similar behaviors can be classified to reduce the effort for human analysts. This is all because of the following two facts.
First, only a few malware are written from scratch. Most malware are variants with simple modifications or upgrades to the original malware. Hence, the variants still inherit the behaviors of the parent malware. Second, malware crafted by advanced polymorphism or metamorphism techniques may deform the appearance of the code but not the program behavior.
Various techniques for malware pattern extraction and recognition have been discussed in previous research. Among these studies, a very common characteristic taken into consideration is the invocation of Application Programming Interface, API, or system calls. Since malware is designed to carry out certain malicious tasks, they inevitably interact with the running environment through these interfaces. In addition, the semantics of program behaviors are actually embedded in invocations on those functions because one important design principle for API is descriptiveness.
However, existing API-trace-based behavior analysis systems lose their advantages when they are facing advanced malware equipped with a kernel-level rootkit. A successfully invasion of the OS kernel implies the acquisition of the privilege of a system administrator, which is able to circumvent or to sabotage any other programs in the system. Although virtualization-based inspection may be used to resolve the aforementioned issues of privilege-escalation, they still have to face the following challenges. A kernel-level program directly invokes kernel-level functions to accomplish its tasks, relying on neither the system nor the user-level APIs. Without a monitoring mechanism for behaviors in the kernel-space, such malware can never be profiled accurately. Therefore, recent researchers are focusing their interests at the kernel-level rootkits.
The methods proposed in U.S. Pat. No. 8,397,295 and U.S. Pat. No. 8,281,393 detect the rootkit by checking system integrity. Their works are only related to certain rootkit features. However, the approach disclosed in the present invention is not limited to certain features and is able to detect the rootkit in general. Moreover, their methods can only detect the existence of the rootkit without recognizing the type of rootkit, which is important when analyzing a rootkit.
The methods proposed in U.S. Pat. No. 8,464,345 and U.S. Pat. No. 7,845,009 use program behaviors as the signature to recognize a rootkit/malware. However, without monitoring the in-kernel function, a high level API may be evaded by a sophisticated rootkit. Moreover, low-level information such as instruction sequences and virtualization traps may extract overwhelming information, which is inefficient and impractical.
In order to overcome the drawbacks in the prior art, methods of generating in-kernel hook point candidates to detect rootkits and the system thereof are disclosed. The particular design in the present invention not only solves the problems described above, but also is easy to implement. Thus, the present invention has utility for the industry.