1. Field of the Invention
Embodiments of the present invention relate to the field of source code management. More particularly, embodiments of the present invention relate to methods and systems for identifying intermittent errors in a distributed source code development environment and related mechanisms to improve developers' efficiency and product quality.
2. Description of the Prior Art and Related Information
It is now common for a number of developers to work on a single set of code. Typically, developers make a copy of all or a portion of the baseline code, shown at 102 in FIG. 1. Changes are made to the copy of the baseline code, which is now termed a transaction, as shown at 106. All transactions (transactions 1-3 being shown in FIG. 1) may be merged into the baseline code 102 when completed. This is typically the manner in which large coding projects incrementally update the baseline code. The baseline code 102, updated by the transactions merged therein over a given interval (e.g., a day) may be called a label. Labels L4 through Ln are shown in FIG. 1. A label may be thought of as a snapshot in time of the baseline code. These labels are in turn the incarnations of the baseline code used to start transactions as explained above. To minimize errors being introduced, developers run tests 108 on their individual transactions before merging the transactions into the baseline code 102. The baseline code 102, in turn, is also tested regularly. Both transactions and labels are tested using many test suites, each of which may include hundreds or thousands of individual tests. Such tests are called regression tests and it is not uncommon for a label to be tested nightly by using more than 100,000 regression tests running on over 1,000 servers. For large and complex code development projects, such testing may be carried out by a farm of hundreds or thousands of computers. The testing may also be carried out using the developers' individual computing power to harness the power of grid computing. This allows many tests to be run simultaneously on many machines.
In a typical scenario, tests are run nightly on both the transactions and the labels (if such labels have been defined, which is not necessary) and when the developers return to the office the next morning, they review the results of the tests that they ran the previous night. The results provide a basis for quality comparison between a transaction shown at 202 and the label used to begin it. The desired goal is to not have transactions adversely affect the quality of the baseline code when they are merged. A success (commonly referred to as a ‘suc’) returned by a given test means that the code is behaving as expected and has successfully passed the test. An error or a difference (commonly referred to as a ‘dif’) means that something is broken in the code and that there is a difference between the actual and expected outcome of the test. Difs may be identified by the size of the error text file output by the regression test. Difs may also be identified in many other ways, such as hashes, text string comparisons, for example. The file size is but one of many possible metrics.
There are two scenarios of interest. The first scenario occurs when a developer is working with a single set of code; and the second scenario occurs when a developer is working with two sets of code where one is based on the other. Within the first scenario, the following nomenclature applies: a ‘dif’ is an error; a ‘suc’ is a success, a ‘consistent dif’ is a dif that is consistently reproduced within the code upon testing and an ‘intermittent dif’is a dif that sometimes occurs within the code and sometimes does not. Within the second scenario, the relevant information relates to the differences between the two sets of code (e.g., A and B, or the new and the old) and the following nomenclature may be established: a ‘new dif’ is a dif that occurred when running a test T on B but not on A (or occurred differently on A); a ‘spurious dif’ is a new dif that is not caused by the code differences between B and A (instead, the dif may have occurred because the error is intermittent in A and therefore in B, or caused by the environment differences, etc), and a ‘real dif’ is a new dif that is caused by the code differences between B and A. So, if B is a transaction built on top of A, a ‘real dif’ would be a dif introduced by the transaction.
Intermittent problems are significant problem: they are the hardest to solve, and so tend to account for many of the errors in the baseline. Intermittent difs encountered in testing a transaction may be caused by errors in the transaction code or in the baseline code. Further, intermittent difs may also be caused by factors that are external to and independent of the code. Such factors generally relate to the prevailing conditions within the environment (e.g. server, server farm, grid computing network) when the regression tests are run. These prevailing conditions may include noise, disk full conditions, timeouts caused by a lack of sufficient CPU cycles, to name only a few examples. Not only are intermittent difs difficult to solve, but they are also difficult to identify. For example, a developer may run a test on a transaction over and over again, to determine whether a dif re-occurs each time the test is run or whether it re-occurs not at all or only a limited number of times. There are, however, several problems associated with this code development scheme. For example, the time required to run the large numbers of test suites on a transaction is often longer than the time necessary to make the code changes. When a new dif is identified, as shown at step S11, the cause may be faulty code within the transaction, an intermittent problem with the transaction or label, an intermittent problem with the test, or a problem with the difference between the baseline run and a transaction run, as they may not be treated identically by the testing mechanism—for example, a transaction may include debugging data in the code, which the baseline lacks. The developer may examine each such error (which takes a large amount of time), or, more likely, rerun the failing tests as shown at S12 in the hope that the failed test will now succeed (match the results of the baseline label run). Re-running the test can be wasteful of both time and computing resources and may cause delays not only in the merging the transaction into the baseline code if there are no problems found, but also will cause delays in the resolution of real difs (errors introduced by the transaction) due to the delay of identifying them as real. If the dif is not re-observed upon repeating the test or if the dif is only re-observed a small number of times as shown at S13, the dif may be characterized as being intermittent in nature, and thus classified as spurious. Alternatively, it may be determined that the dif is consistent and the developer should review and debug his or her code, as indicated at S16. If, after running the regression test multiple times, it is determined that the dif is spurious in nature as shown at S14, the transaction with the spurious dif may be merged into the baseline code as shown at S15, with the expectation that the transaction's code changes are not faulty, but rather that either the baseline code or the environment is the cause of the dif. However, running these transaction and/or label tests over and over again is wasteful in both time and computing resources, as the underlying transaction could have been merged into the baseline code much earlier. In turn, the developer could have pushed ahead with code development, instead of wasting valuable time and computing resources determining whether the difs are real or spurious. Further, this approach will cause the developer to ignore an intermittent dif introduced by his or her transaction, thereby introducing an error into the baseline code. Alternatively, instead of running the regression tests over and over again, the developer may choose to examine the transaction code manually in an attempt to determine whether the difs are related to the code changes or instead are related to factors external to the transaction code. This, however, can waste a great deal of developer time, as there are commonly dozens or hundreds of difs to be examined.
The baseline code, moreover, has its own errors that must be resolved. The mechanism for finding such errors is generally based on a single run of every test for each new label of the baseline. This does not address intermittent errors, which are frequently discovered long after the error was introduced, when a run happens to encounter the problem. At that point, finding the cause and assigning the error is difficult.
Conventionally, there is no well defined way of dealing with intermittent difs when they are encountered in a label of the baseline code. Because such errors are not encountered in every label, it is normally impossible to tell which transaction introduced the error, and a judgment call must be made as to which developer to task with the resolution of the dif. This makes resolution of the problem extremely difficult for the developers, as there is no way to tell which code changes are related to the error. Because of the time it takes to find, assign, and resolve such issues, intermittent difs may persist in the baseline for months at a time. The baseline code is tested a limited number of times and such limited testing may overlook intermittent difs for a long time, and subsequently the errors take a long time to fix.
From the foregoing, it may be appreciated that there is a need for improved methods and systems for identifying intermittent difs so as to be able to ignore spurious difs encountered in transaction testing and resolve intermittent errors in the baseline code. Preferably, such methods and systems should make this determination in a manner that is economical in terms of both time and computing resources, and that allow developers to spend more of their time developing code and less of it testing and characterizing difs generated by testing suites.