1. Field of the Invention
The present invention relates to development and analysis of software programs, and, more particularly, to a method and system for creating uniquely representative execution path identifiers of software program.
2. Description of the Related Art
Developers of computer software have for over 50 years suffered a mismatch of information density to their needs. Within a computer system exists a superabundance of information about the executing software: every executed instruction, every data. value, and every iteration of every function. Within this vast information resides evidence of all software defects (bugs) and intimate details about how all of the code actually behaves. Unfortunately, no prior art has devised a method of both efficiently exporting that information for developer use, and organizing it to make it easier to find defects and to gain understanding of how software actually works. All prior art methods have sought to either severely limit the amount of exported information to a small, predefined subset (such as with breakpoint debuggers), or have enabled mass export of this data in its raw form (such as with trace debuggers), without assisting the software developer in organizing this mass of data.
The inherent problem with analyzing and debugging computer software has been in effectively managing complexity and information on a very large scale. From the earliest days of computer software, a subtle defect in a 20-line application running at 1000 instructions per second could confound even the best engineers, who could struggle for many hours to capture an infrequent anomaly that happens unpredictably in the midst of correct operation Software applications have since grown to millions of lines of code running at billions of instructions per second, yet these same troubles remain, Furthermore, the defect rates of software have remained nearly constant for decades, often exceeding 50 defects per 1000 lines of code for newly-written software, to less than 1 defect per 1000 lines of code for software developed in a rigorous development model. Even with 80% of the cost of developing software being consumed by finding and fixing software defects, this is usually a process of achieving ‘good enough’ software that contains fewer than 10 defects per 1000 lines of code, so software is regularly and knowingly released with hundreds of defects, and the cost of these defects are measured in the tens of billions of US dollars annually.
Prior-art techniques for debugging and analyzing software execution can be classified. into two categories, depending on their intrusiveness, Highly intrusive techniques include breakpoint debuggers, single-stepping through program code, and print debugging. These approaches can alter the flow of program execution enough to make the original problems non-reproducible during debugging. Low-intrusive techniques include real-time trace ports such as ARM ETM, MIPS PDTrace and IEEE/ISTO Nexus-5001, which do not intrude on program execution, but often require substantial resources (package pins for trace export or substantial on-chip buffers). All of these techniques suffer from serious drawbacks.
First, all of these methods are created on the premise that a software developer will search for the cause of one known, reproducible bug at a time. This requires the developer to first make an educated guess about where a particular defect originates, so a breakpoint, trigger or other mechanism can be set to enable capture of the exact portion of execution data that contains evidence of the cause of the present problem. This is usually an iterative process, since the cause of software errors are often not easy to determine, and a series of iterations can add up to span a long time duration to find and correct just one error, particularly if the error has a tow recurrence rate or is otherwise difficult to reproduce,
Second, these techniques will only help a developer to isolate software defects that they become aware of through external symptoms. Defects with subtle symptoms or very low recurrence rates can often elude detection through the entire development process, and end up shipping with the final product.
Third, these techniques only produce raw information about a particular moment of execution, but they do not provide context information as to whether this portion of execution is unusual, or if it is merely another instance of a common execution sequence.
Advancements have been made to try to solve some of these shortcomings. To provide more information to the software developer, real-time trace collection capacity has been steadily expanding, with premium trace collection probes (from Lauterbach GmbH, Green Hills Software Inc., etc.) offering capacities of up to 4GB, However, this capacity increase is somewhat misleading, as it only offers bulk collection capacity, but does not perform real-time characterization of the data to determine if it is actually useful and contains valuable information about a transient defect or other low-recurrence event. Trace data collection is still mainly centered on a pre-defined trigger event that is the suspected cause of a single defect of interest. Regardless of the capacity, a real-time trace collection system that cannot continuously collect and categorize the execution information will impose limits on the developers visibility into the executing software, and the process of debugging will likely remain a one-known-bug-at-a-time endeavor.
Several approaches to reducing the bandwidth requirements tor instruction trace export have been devised. Current industry-standard trace ports can reach an average as low as 1.2 to 0.4 bits per instruction for instruction-only trace export.
On-the-fly examination of execution sequences has been studied by Hou et al. (US 2010/0281310 A1), but this method is focused on identifying whether a single pre-determined execution sequence has occurred, as a means of creating a trigger for capturing the associated data. This is unlike the present invention, which is a compression and identification system for the entirety of software running on a target computer system. The system devised by Panigrahy et al. (U.S. Pat. No. 8,069,374 B2) creates ‘fingerprints’ from the text of system event log files for the purpose of automating the correction of system configuration errors, so unlike the present invention it cannot identify software bugs or behavioral anomalies at the function level, and it cannot be practically implemented in computer logic.
For software running on computer systems that lack any form of execution trace, the available options for debugging and understanding the behavior of target software are generally very intrusive and are required to place severe limitations on the total amount of information that can be obtained from the computer system. Breakpoint debuggers, and instrumented or sampled execution profiling arc the primary options for software developers on these platforms, but both are highly intrusive, and neither performs compression/decompression to reduce bandwidth and storage requirements.
The net result of these issues is that software development remains an expensive process. Software has become the single most expensive component in modern automobiles, aircraft, and scores of other devices, and is often the key determining factor in a products success or failure. Current trends point toward faster processors, more processing cores, and larger applications, which indicates that these problems likely get worse.
While known techniques for analyzing software execution have proven to be acceptable for some applications, such techniques are nevertheless susceptible to improvements advance the art.