This invention relates generally to a system and method for processing one or more data sets received from one or more computer-based systems and in particular to a system and method for automatically categorizing and characterizing the data sets generated by the computer-based system.
The tremendous expansion of the Internet has led to the expansion of the number features in software applications. The expansion of the Internet has also necessitated the more rapid development of various software applications and has changed various software development methodologies. For example, the technique for beta testing software applications has drastically changed. Prior to the proliferation of the Internet, a company might beta test a software application by distributing the software application by floppy disk to a limited number of beta testers. With the Internet, the process of beta testing requires only that the developer place the beta software application on its web site and then anyone interested in beta testing the software application may do so with almost no expense to the developer. The problem with both of these beta test distribution techniques, however, is that it is difficult for the developer of the software to obtain good feedback from the beta testers.
Therefore, in order to properly beta test a software application and for the developer to benefit from the beta test, it is desirable to provide some medium for the beta testers to communicate with the developer who can gather the beta testers' bug reports and comments and correct any bugs. This process was typically accomplished by a beta test coordinator who was responsible for gathering the relevant information and routing the bug reports to the appropriate engineers. It is desirable to provide a system that automatically retrieves the bug reports and comments from beta testers.
In order to provide quality assurance (QA) feedback to a user of the software application, it is additionally desirable to be able to recreate a user's problem so that the Quality Assurance person can quickly help the user. In some conventional systems, the Quality Assurance person attempts to recreate the problem based on a user's recollections of the events, the user actions within the software application, such as entering the print routine, or the keystrokes that caused the error. This is very often difficult to accomplish since either the user may not remember all of the steps he took that caused the problem or the problem only manifests itself on the user's computer due to the configuration of the user's computer. In addition, determining the exact configuration of the user's computer is sometimes difficult since the user may not remember, for example, the type of graphics card that he installed in his computer. Therefore, it is desirable to be able to determine the configuration of a user's computer and capture information about the user's actions in order to help the Quality Assurance process.
Once a plurality of pieces of data about a machine state in a computer-based system, known as a data set, have been received from a computer-based system, it is desirable to be able to automatically process these pieces of data. In particular, it is desirable to group the data into categories of similar incidents. To categorize each piece of data, it is necessary to parse the pieces of data and automatically generate links between pieces of data that contain information about, for example, the same software crash. Duplicate pieces of data about the same event may be automatically identified and removed. It is also desirable to determine whether a particular incident is a first instance of a particular problem. A conventional bug tracking system often makes it difficult to eliminate pieces of data about the same bug or event since a person must search through all existing bug reports in order to determine the particular problem has already been located. To automatically recreate a crash, known as characterization, the system must determine the crash parameters from the piece of data and recreate the problem, if possible.
No known system automatically classifies and characterizes a data set containing information about the state of a computer-based machine. Thus, there is a need for a system and method for automatically classifying and categorizing state machine data and it is to this end that the present invention is directed.