The problem of separating an object into its constituent components, thereby allowing an analysis of the internal structure of the object based on those components, is a long-standing problem in reverse engineering complex systems. For example, in the software analysis field reverse engineering mechanisms typically examine individual objects in isolation, and base a decomposition of components on properties internal to the object. These techniques tend to be slow and inaccurate because they rely on detailed information about an object and on fuzzy, heuristic decisions.
An example of a method of performing a system reverse engineering process is described in U.S. Pat. No. 6,978,228. U.S. Pat. No. 5,675,711 provides an example of adaptive statistical regression and classification of data strings, with application to the generic detection of computer viruses.
Computer malware detection has typically been conducted with the use of programs that monitor files and application on individual computers. The detection methods often rely on large databases that contain signatures of previously identified computer viruses, worms, trojans, spyware, or other malicious computer programs. Malware scanning programs search individual files on individual computers searching for known signatures. While this pattern detection approach can be effective it requires frequent updates to the database of signatures to keep abreast of the most recent malware developments.
Interest by the reverse engineering and anti-malware communities in analysis of mobile applications has increased due to the widespread public adoption of mobile communication devices such as smart phones that include large amounts of personal data that may be subject to exploitation by malicious programs. There are also general needs for malware detection systems and methods that are suitable for screening applications before they are distributed to, or used with, mobile communication devices such as smart phones.
Genome analysis also presents the problem of breaking down objects into their constituent components. Sequences of DNA in a genome may include vast numbers of individual genes that may be challenging to recognize or identify. Additionally, even after a gene is identified, the function of an individual gene, or the interaction of multiple genes, may not be apparent without significant research into specific genes.