During an operation procedure of a computing device, when an abnormal event or accident occurs, a user or a system may perform crash dump to save useful context information. In crash dump, stacks for crash dump is one of most important information or signatures for crash dump, and indicates a direct cause of a failure occurring to a computing system/process. Stack frame backtracking with respect to stacks for crash dump may provide a unique explicit sequence of calling a code path that leads to occurrence of crash dump.
For a system under a heavy test, many crash dumps having similar stacks may be generated, and it is likely always needed to determine whether a new crash dump is associated with other crash dumps that are being analyzed or have been analyzed, so as to avoid repetitive work by referring to the analysis of other crash dumps.
However, such a determination is not an easy job, because even a similar code path is present in stacks for crash dump, it is nearly impossible to find a completely matched crash dump. This is because there are always many discrepancies or noises in stacks for crash dump. It is also the same situation for some other external communities. For example, when searching common and complete stacks for crash dump of Linux/Windows open source applications or kernel using popular web page browsers with a strong search engine (e.g., Google, Baidu, etc.), usually no useful results can be found.
In the existing solutions, because a complete text of stacks for crash dump for each individual generally includes much noise information, in order to determine a similarity between two crash dumps, it may be needed a typical classification algorithm (e.g., Bayesian classification algorithm) to construct a specifically customized full-text search engine.
However, such approach has a drawback of introducing much noise (from a variable part of the stack text), and also losing the calling order information in the stacks. Even the latter can be amended by considering a ranking or an order of words in the stack text, but if the number of crash dumps is very large, the computational cost will also increase accordingly. Because it is needed to compare the given stack file with all of the existing stack files.
Besides, such approach also introduces a very large complexity which makes it hard to be implemented and run fast on a computer work station with limited resources. In addition, a similarity derived according to such approach cannot provide a simple understanding of a failure-related problem, because it only takes individual words into account and loses complete context information.