Conventionally, methods for analyzing malware can roughly be divided into static analyses and dynamic analyses. Static analyses are methods by which functions of malware are understood by analyzing program codes of the malware. It should be noted that, however, because static analyses are to analyze functions of malware in a comprehensive manner, a large amount of human labor is involved. In contrast, dynamic analyses are methods by which functions of malware are analyzed by preparing an environment in which behavior of malware is recorded and causing the malware to operate in the prepared environment. It is easier to automatize extraction of the behavior in dynamic analyses than in static analyses.
One example of such dynamic analyses of malware is called dynamic taint analyses. During a dynamic taint analysis, for example, in a virtual machine, a virtual Central Processing Unit (CPU) tracks flows of data read and written by malware to and from a virtual memory or a virtual disk. More specifically, the dynamic taint analysis is structured with three phases such as assigning a taint tag, propagating the taint tag, and detecting the taint tag.
For example, when detecting a leak of confidential information by malware, the virtual CPU performs the following processes: In the first phase, the virtual CPU causes the malware to operate. After that, at the stage when a file containing the confidential information is read into a memory, the virtual CPU assigns a taint tag denoting confidential information in correspondence with a storing position, within the memory, of the file containing the confidential information. Normally, the taint tag is stored in a region (which may be referred to as a “shadow memory”) prepared separately from a physical memory managed by an Operating System (OS). The region is structured in such a manner that the OS and applications (including malware) are unable to have access thereto.
After that, in the second phase, as a result of the virtual CPU monitoring transfer instructions and the like between registers and memory regions, the taint tag is propagated in accordance with copying of the confidential information. Further, in the third phase, the virtual CPU checks to see whether any of the pieces of data output from a network interface has the taint tag assigned thereto, the taint tag denoting the confidential information. When the taint tag is assigned to any of the output data, the virtual CPU detects that an attempt is made to output the confidential information to the outside.
Further, as an example in which the dynamic taint analysis is applied, another technique is also known by which a breakpoint in a debugger is realized with a taint tag. According to this technique, a user allocates a taint tag in advance to a position where he/she wishes to interrupt a program (i.e., the position where a “breakpoint” is set). Further, a virtual CPU examines the program to see whether a taint tag is assigned in correspondence with any of executed instructions and, when a taint tag is assigned, the virtual CPU interrupts the program. Further, as for destinations of the propagation, yet another technique is also known by which a taint tag is propagated to a disk.
Further, in another example of methods for dynamically analyzing malware, an endeavor has been made to track behaviors of an attacker by configuring File Transfer Protocol (FTP) account information or the like (called a “honey token”) prepared in advance into an analysis-purpose personal computer (PC), so as to intentionally cause the malware to leak information. With this arrangement, it is possible to understand in what manner the attacker abuses information obtained thereby.