The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Complex software applications often require massive collections of source code files organized within codebases. Codebases are usually created by engineers, software developers, and other technicians who write individual code files describing the application's modules, methods, functions, etc. These files are included within the application's codebase. To keep the files organized, the codebase may be conceptually maintained in a tree structure maintained by a version control system. Within the code tree, a branch generally describes a version of an application module that was changed, but does not include any new features, while a trunk may include a module with new features. Teams of engineers design, develop and deploy software across distant locations. Multiple versions of software files are often deployed from these different locations while other developers work on updates to the same files. Errors or “bugs” may only be present in certain versions, therefore to locate and fix any bugs, particular versions of the software must be located and tested to determine which version(s) are causing the problem.
During the lifetime of an application, software developers commit or merge multiple file versions, patches, edits, etc., to the codebase. These commit operations tell the version control system that a group of changes will be made final and available to all users. That group of changes is typically maintained in a change list. Each group of changes in a single commit action includes a unique change list ID.
Of course, whenever a software developer performs a commit action, he or she also possibly introduces bugs to the codebase. To counter the inevitable introduction of bugs, most software development teams are complimented by a quality assurance team to perform automated and manual testing of the files within the codebase. Ideally, quality assurance teams perform tests on the submitted files until the codebase reaches a confirmed level of code maturity and stability. However, because the complex development and execution relationships between the various files in the code tree, it is difficult to measure an exact level of quality assurance. For example, development of an application follows numerous paths along the code tree including various branches and trunks during the development cycle. Accounting for and writing tests for every possible code path in a complex application would consume a significant amount of resources and time. Further, it has been observed that bugs are not distributed linearly within a codebase, but rather, bugs occur in “bursts” where a single bug causes multiple, other bugs in a cascading effect throughout the code base. Thus, testing a percentage of the total amount of code within the codebase will not account for an equal percentage of the total number of bugs that are present in the codebase (i.e., scanning fifty percent of the total amount of code within the codebase will not account for fifty percent of the total bugs in the codebase).
Some techniques to identify and analyze bugs within a codebase have focused on providing a Boolean indicator of whether a particular piece of the codebase was more or less likely to include a bug. For example, cache techniques may rank files within a codebase according to the number of lines of code within the file. Then, each of the ranked files and its closest relatives may receive a “hit” if it had been changed to fix a bug. Files that have been fixed most recently may remain in the cache, while those that have been changed less recently may be removed from the cache. These techniques allow a “hit or miss” identification of fault-prone files within a cache selection of the codebase.