Software security and vulnerability checking is an active field of academic and industrial pursuit. With the news of exploitation of software vulnerabilities by hackers a commonplace occurrence, it is unsurprising to see many academic and professional institutions focusing their efforts to develop tools and practices that aim to make software more secure against exploitative attacks from global hackers and adversaries.
There are many ways of detecting and addressing vulnerabilities in software in the prior art. U.S. Pat. No. 8,499,353 discloses security assessment and vulnerability testing of software applications based in part on application metadata in order to determine an appropriate assurance level and associated test plan that includes multiple types of analysis. Steps from each test are combined into a “custom” or “application-specific” workflow, and the results of each test then correlated with other results to identify potential vulnerabilities.
U.S. Pat. No. 8,365,155 describes a software analysis framework utilizing a decompilation method and system for parsing executable code, identifying and recursively modeling data flows, identifying and recursively modeling control flow and iteratively refining these models to provide a complete model at the nanocode level. The nanocode decompiler may be used to determine flaws, security vulnerabilities, or general quality issues that may exist in the code.
U.S. Pat. No. 8,739,280 describes a context-sensitive taint analysis system. Taint processing applied to a tainted value of an application is identified and an output context of the application associated with output of the tainted value is determined. It is determined whether the taint processing is effective in mitigating a security vulnerability caused by the tainted value for the output context.
U.S. Pat. No. 8,347,392 describes an apparatus and method for analyzing and supplementing a program to provide security. A computer readable storage medium has executable instructions to perform an automated analysis of program instructions. The automated analysis includes at least two analyses selected from an automated analysis of injection vulnerabilities, an automated analysis of potential repetitive attacks, an automated analysis of sensitive information, and automated analysis of specific HTTP attributes. Protective instructions are inserted into the program instructions. The protective instructions are utilized to detect and respond to attacks during execution of the program instructions.
Non-Patent reference, “Dynamic Taint Analysis for Automatic Detection, Analysis” by James Newsome and Dawn Song of Carnegie Mellon University, proposes a dynamic taint analysis solution for automatic detection of overwrite attacks. The approach does not need source code or special compilation for the monitored program, and hence works on commodity software. To demonstrate this idea, they implemented TaintCheck, a mechanism that can perform dynamic taint analysis by performing binary rewriting at run time.
Non-Patent reference, “gFuzz: An instrumented web application fuzzing environment” by Ezequiel D. Gutesman of Core Security Technologies, Argentina, introduces a fuzzing solution for PHP web applications that improves the detection accuracy and enriches the information provided in vulnerability reports. They use dynamic character-grained taint analysis and grammar-based analysis in order to analyze the anatomy of each executed SQL query and determine which resulted in successful attacks. A vulnerability report is then accompanied by the offending lines of source code and the fuzz vector (with attacker-controlled characters individualized).
One shortcoming of prior art teachings is that they suffer from poor accuracy while also at times requiring source code for analysis as opposed to just bytecode/assembly code, or they attempt to simplify the bytecode/assembly code before analysis. Other prior art work teaches running both dynamic and static analysis components in an independent or serial fashion. Furthermore earlier approaches attempt to exhaustively map all data flows in a decompiled or intermediate representation of a software system which impairs performance and slows the overall process. Relatedly, prior art teachings do not provide for advantages afforded by concurrent multi-core or multi-CPU processing infrastructure that is commonplace these days, to allow for distributed analysis of very large target software systems with high precision. Similarly, prior art teachings suffer from poor performance by not proper utilizing the benefits of precomputation and caching of the analysis of basic blocks of code.