The present invention relates to the diagnosis of problems with equipment, especially complex equipment.
Presently, many companies market products that, when there is a problem, are serviced by field engineers. Typically, the field engineer diagnoses the problem with the product and then performs any necessary repairs or adjustments to the product. With respect to each problem with a product, the field engineer drafts a repair record that includes a description of the problem or symptoms of the problem and when appropriate, a description of the repair or adjustment made to address the problem. In many cases, the descriptions in the repair record are in either free-form text that characteristically has spelling errors, abbreviations, technical terms and terms of art or in multimedia form where text is an integral part of the record. This information is typically stored in a dispatch database that is used to describe the service performed to the customer and to keep a history of the problems with a particular product at a client""s site.
Thus, there is a particular need that the records contain substantial data that could be used to speed the diagnosis and repair of a product but that the variation in the text prevents the relevant data from being extracted and placed in a form that is available to a field engineer. The present invention addresses this problem by providing a method for processing records to extract diagnostic information from records and make the information available to field engineers and others.
In one embodiment, the method involves accessing two groups of records that typically includes a problem and a solution to the problem. The first groups of records has been determined by an expert to be diagnostically relevant, i.e., a field engineer encountering the same or a similar problem would find the information in the record to be useful. For instance, the information may include a problem with a unique set of symptoms that required a substantial amount of time for the field engineer that generated the record to diagnose. Another engineer encountering the same or a similar problem would be able to much more quickly or efficiently address the problem if the information from the record was available. The second group of records has been determined by an expert to be diagnostically irrelevant, i.e., of little use to another field engineer. For instance, a record indicating that normal preventive maintenance had been performed would likely be considered to be diagnostically irrelevant.
The first and second groups of records are analyzed to learn the reasoning used by the expert to sort the records into the first and second groups. In one embodiment, the analysis involves breaking the record phrases into fragments, which are referred to as n-grams, with xe2x80x9cnxe2x80x9d representing the length of the fragment. For example, the word xe2x80x9cdiagnosexe2x80x9d has the following 3-grams: xe2x80x9cdiaxe2x80x9d, xe2x80x9ciagxe2x80x9d, xe2x80x9cagnxe2x80x9d, xe2x80x9cgnoxe2x80x9d,xe2x80x9cnosxe2x80x9d and xe2x80x9cosexe2x80x9d. Associated with each n-gram are two counts, the first count is the number of times that the n-gram has occurred in the first group and the second count is the number of times that the n-gram has occurred in the second group. These counts are used to assign a weight to each n-gram that can be subsequently used to automatically determine whether a record is diagnostically relevant or irrelevant. For example, words like xe2x80x9crepairxe2x80x9d and xe2x80x9creplacexe2x80x9d should occur more frequently in the first group, i.e., the diagnostically relevant group. Consequently, the n-grams associated with these words tend to be positively weighted, indicating their diagnostic relevance. In contrast, words like xe2x80x9cvoidxe2x80x9d and xe2x80x9cduplicatexe2x80x9d should occur more frequently in the second group, i.e., the diagnostically irrelevant group. The n-grams associated with these words tend to be negatively weighted. Finally, words like xe2x80x9cthexe2x80x9d, xe2x80x9cforxe2x80x9d, xe2x80x9cisxe2x80x9d and xe2x80x9caxe2x80x9d typically occur approximately equally in the first and second groups, reflecting their low value in determining whether a record is relevant or irrelevant. Consequently, the n-grams associated with these words tend to have weights of approximately zero.
The weights associated with each n-gram are retained in a database or other appropriate data structure and used to automatically assess whether a candidate record provides diagnostically relevant information that should be included in a database available to field engineers. In one embodiment, the candidate record is broken into n-grams. The weights for each of the n-grams in the candidate record that are also in the database are retrieved and summed. If the sum is positive, this indicates that the candidate record tends to be diagnostically relevant. A negative sum is indicative of the candidate record being diagnostically irrelevant. The absolute magnitude of the sum represents the degree to which the candidate record is diagnostically relevant or irrelevant. For instance, a very high positive sum would indicate that the candidate record is very, diagnostically relevant and the information contained in the candidate record should be added to the database for use by the field engineers.