System developers routinely need to analyze the behavior of the software systems they build. One basic analysis is to understand observed behavior, such as why a Web server is slow, for example, on a Standard Performance Evaluation Cooperation web benchmark. More sophisticated analysis aims to characterize future behavior in previously unseen circumstances, such as what will a Web server's maximum latency and minimum throughput be, once deployed at a customer site. Ideally, system designers would also like to be able to do quick “what-if” analyses, such as determining whether aligning a certain data structure on a page boundary will avoid all cache misses. For small programs, experienced developers can often reason through some of these questions based on code alone. However, there currently exists no platform that is able to answer such questions for large, complex, and real systems.
Such a platform would need to enable easy construction of tools like oprofile, valgrind, bug finders, reverse engineering tools and simultaneously have the following three key properties: (1) be able to efficiently analyze entire families of execution paths; (2) maximize realism by running the analyses within a real software stack; and (3) be capable of handling binaries. There is no practical tool today that can provide all three of these properties together, so system builders often have to resort to guesswork and extrapolation.
First, predictive analyses must measure entire families of paths through the target system, whereas existing tools can measure only one path at a time. Reasoning about families of paths is key to predicting behavior; ultimately, properties that can be shown to hold for all paths Constitute proofs—the ultimate prediction. Security analysis also requires reasoning about all execution paths through the program, to ensure that desired security policies cannot be violated even in corner cases. These are all multi-path (i.e., symbolic) performance envelopes for programs, instead of profiling performance solely along one path. Not only are such analyses of interest for real-time requirements (e.g., to ensure that an interrupt handler can never exceed an upper bound in execution time), they are also useful for capacity planning (e.g., to determine how many web servers to provision for a web farm). A powerful multi-path analyzer could also automatically generate worst-case and best-case workloads for programs.
Second, an accurate estimate of program behavior often requires taking into account the whole environment surrounding the analyzed program: libraries, kernel, drivers and central processing unit (CPU) architecture, in other words, in-vivo analysis, program analysis which captures all interactions of the analyzed code with its surrounding system, and not only with a simplified abstraction of that system. Even small programs interact with their environment (libraries, operating systems, etc.), e.g., to read/write files or network packets, so understanding the behavior of the program requires understanding the nature of these interactions. Current approaches either abstract away this environment behind a model, or execute the real environment but allow calls from different execution paths to clobber each other's state. Writing abstract models is labor-intensive, taking in some cases multiple persons years, and practically always results in an incomplete and/or inaccurate model; maintaining the accuracy of these models in the face of the evolution of the modeled system is even more challenging. Therefore it is necessary to allow analyzed programs to interact consistently with the real environment during multi-path analyses. A common form of in vivo analysis occurs when testing large programs, like Mozilla Firefox, when one typically wants to focus attention on a particular area of the code, such as a module that is known to deadlock, or code that was recently added or modified—the rest of the system becomes “the environment.”
Third, real systems are made up of many components from various vendors; access to all corresponding source code is rarely feasible and, even if it is, building the code exactly as in the shipped software product is difficult. Thus, analysis needed to operate directly on binaries is a requirement that is often very expensive. The first and foremost challenge in performing analysis that are both in-vivo and multi-path is scalability. Going from single-path analysis to multi-path analysis is itself expensive because the number of paths through a program increases exponentially with the number of branches; this is known as the path explosion problem. For this reason, state-of-the-art symbolic execution engines can barely handle programs with a few KLOC (one thousand lines of code), because the cost in terms of memory and exploration time is generally exponential compared to the size of the program. For in-vivo multi-path analysis to be consistent, one would need to symbolically execute the programs, libraries, OS kernel, drivers; even the CPU and devices would have to be simulated. With today's tools, this is not feasible.
In addition, device drivers are one of the least reliable parts of an OS kernel. Drivers and other extensions—which comprise, for instance, 70% of the Linux operating system—have a reported error rate that is 3-7 times higher than the rest of the kernel code, making them substantially more failure-prone. Moreover, some drivers are vulnerable to malformed input from untrusted user-space applications, allowing an attacker to execute arbitrary code with kernel privilege.
It is therefore ironic that most computer users place full trust in binary device drivers: they run drivers inside a kernel at the highest privilege levels, yet enjoy a false sense of safety by purchasing anti-virus software and personal firewalls. Device driver flaws are more dangerous than application vulnerabilities, because device drivers can subvert the entire system and, by having direct memory access, can be used to overwrite both kernel and application memory. As of now, there exist several tools and techniques that can be used to build more reliable drivers or to protect the kernel from misbehaving drivers, but these are primarily aimed at developers who have the driver's source code. Therefore, these techniques cannot be used (or even adapted) for the use of consumers of closed-source binary drivers.
The availability of consumer-side testing of device drivers is essential. As of 2004, there were 800,000 different kinds of plug and play (PnP) devices at customer sites, with 1,500 devices being added every day. There were 31,000 unique drivers, and 9 new drivers were released every day. Each driver had approximately 3.5 versions in the field, with 88 new driver versions being released every day. Faced with an increasing diversity of drivers, consumers (end users and IT specialists alike) feel the need to figure out a way to perform end-to-end testing just before installation.
Black-box testing of closed-source binary device drivers and other device drivers is difficult and typically has low code coverage. This has two main reasons. First, it is hard to exercise the driver through many layers of software stack that lie between the driver's interface and the application interface. Second, closed-source programs are notoriously hard to test as a black box. The classic approach to testing such drivers is to try to produce inputs that exercise as many paths as possible and (perhaps) check for high level properties (e.g., absence of kernel crashes) during those executions. Considering the wide range of possible inputs and system events that are hard to control (e.g., interrupts), this approach exercises relatively few paths, thus offering fewer opportunities to find bugs.