In order to verify that software is functioning properly, the software must be adequately tested. Software testing is typically done after a threshold of change is reached to warrant the testing. Software testing may be performed during final integration of code or at intermediate stages in the development process, particularly in the case of more complex software packages.
According to an approach disclosed in a related application (U.S. patent application Ser. No. 11/388,445), “PROBABILISTIC SYSTEM FOR IDENTIFYING SOFTWARE DEVELOPMENT REGRESSIONS” filed on Mar. 24, 2006 and incorporated by reference as if fully set forth herein, a variety of historical data may be accessed. The historical data may include raw data, as well as analysis (e.g., calculations or probabilities) generated based upon the raw data. Examples of historical data include the number of errors generated by a particular portion of code, identities of users or groups of users responsible for generating and/or modifying the portion of code, and the platform or architecture on which a particular portion of code was tested. In some embodiments, the historical data comprises at least one of one or more source code files, one or more source code control logs, one or more integration requests, and one or more failure reports.
The historical data may be obtained over one or more software builds. From this information, it is possible to more accurately and efficiently identify portions of code that are likely to be problematic. In this manner, cumulative data may be used to identify causes of failures, even where the failure that has been introduced is not detected until a later date.
However, for a large system, over the years, source code in many files may have been written several times from inceptions of the files. Thus, the historical data may show many code components in the source code as potentially causing a regression in a later build. As a result, identifying a likely candidate becomes more and more time consuming as the volume of the historical data grows over the time.
In view of the above, a need exists for a tool that can efficiently make use of a large volume of historical data for the purpose of identifying a problem build that causes a software regression identified in a software system.