Software applications can be subject to various security vulnerabilities, such that inadequately designed or written source code can allow attackers to threaten privacy, steal data, or present other security concerns. For example, an insecure web application could potentially expose vital data to the World Wide Web, which may result in unauthorized access to confidential information. Moreover, software applications may be subject to many intrinsic vulnerabilities, including memory leaks that cause application crashes or instability, improperly formulated function calls that cause incorrect or inconsistent data, or other vulnerabilities. As a result, effectively auditing software to identify and rectify vulnerabilities can significantly improve enterprise security and application performance, or provide other quality assurance advantages. Thus, software audits may often be needed to address various operational risks posed by vulnerable software, in addition to being required for compliance with mandatory regulations and policies that govern data privacy, integrity, and good corporate governance.
Unfortunately, existing techniques for auditing software tend to be based upon “snap shots” taken at singular points during the lifecycle. By auditing a “snap shot” of code likely to change many times prior to (or after) release, results of an audit may only be meaningful for severely limited amounts of time. For instance, during development of any given software application, engineers may often modify source code associated with the application to resolve bugs, add features, improve efficiency, or otherwise contribute to application development. Furthermore, in response to changing user needs, system capabilities, design choices, or other factors, software vendors may release new or updated versions of an application at various points in time. Thus, software application lifecycles can often include several revisions, rewrites, or other modifications to source code associated with a given application. Thus, because audit information can become stale or meaningless in response to any given code modification, “snap shot” audits tend to become unrepresentative of a current code base soon after the audits have occurred, thus necessitating scheduling of a new audit that will also be subject to similar limitations. As a result, using existing software audit techniques, an audit's value or longevity depends primarily on how frequently audits occur. Considering the high investment costs associated with performing audits (e.g., in terms of time, money, or other factors), coupled with a general lack of enthusiasm to perform audits, lengthening a validity period associated with a software audit can offer significant advantages over existing systems.
Furthermore, analyzing a “snap shot” of a software application can present obstacles to debugging software, optimizing performance, or performing other quality assurance tasks. Thus, instead of analyzing static representations of software, various systems have been developed that can track changes made in software at multiple points in the software lifecycle (e.g., based on dates, versions, etc.). However, existing techniques for tracking the software changes tend to be performed either at a source code level, or at a native binary executable level, both of which have significant inconveniences and limitations that can prevent effective judgments of what constitutes an important change in the software.
For example, tracking changes to source code may appear to have simplicity advantages (e.g., where differences between two source files can be identified using a file comparison utility, such as diff). However, results produced thereby tend to have limited utility, as simple comparisons of source code can potentially yield excessive, useless, or meaningless results (e.g., changes to comments, variable names, pre-processor directives, code that will not be compiled or utilized in a final executable, etc.). Furthermore, the results may be imprecise because comparison tools may not necessarily be aware of rules governing a programming language in which the source code was written (e.g., in C#, changing a type definition from struct to class may appear minor, source wise, but an impact of the change may be significant). Even when language parsers or other tools can obtain better results, the tools tend to lack “simpler” change detection mechanisms, instead focusing on detecting issues with the software (e.g., eliminating infinite loops, dereferencing NULL pointers, etc.).
Furthermore, tracking changes to native binary executables tends to be relatively uncommon, though hackers routinely use such techniques to reverse engineer patches. However, by the time that source code has been compiled into the native binary executables, a significant amount of useful tracking information may be lost. In addition, binary executables may differ significantly from original source code due to compiler optimizations (e.g., inlining, constant propagation, loop unrolling, dead code elimination, etc.) that can eliminate code segments, render redundant code nonfunctional, or otherwise optimize functional aspects of source code. Furthermore, binary executables tend to be generated for specific platforms or processing architectures, limiting a relevance of any identified changes to the platform or architecture for which the executables were prepared.
Existing systems suffer from these and other problems.