Systems that facilitate remote technical assistance are an integral part of overall information technology (IT) product sales, deployment, and maintenance life cycle. Such systems are used, for example, by technical employees, business partners, and vendors to help solve the problems that customers have with hardware and software products. Typically, the technical helpdesk party receives, from the customer, an electronic mail (e-mail) or a telephone call describing the issue that needs to be fixed. The technical helpdesk party records, in free form text, the initial e-mail and subsequent e-mail exchanges on that issue, as well as any other information that the party considers relevant to describing or solving the issue.
The technical helpdesk party records this information by using specific trouble ticket (TT) management tools. These tools help in the tracking of individual tickets. Thus, when a technical helpdesk party needs to solve a problem, they can first check to see if the problem has been reported for another customer. If it has, the party can read how to fix the problem and avoid spending time trying to solve problems that other technical helpdesk parties have already solved.
However, searching in a collection of free form documents for a particular topic can be difficult and error prone. For example, one could try to find potential resolutions for fixing “Websphere AS version 5.1 on Windows,” and retrieve an overwhelmingly large amount of irrelevant tickets just because the tickets contain “Websphere version 6 has been upgraded from Websphere AS version 5.1” in their text. Alternatively, a party may retrieve an entry stating, for example, “try the ticket queue for Websphere AS version 5.1 on Linux, because here you are on Websphere AS version 5.1 on Windows queue.”
Existing approaches include a knowledge discovery approach to problem ticket data. For example, U.S. Pat. No. 6,829,734 entitled “Method for discovering problem resolutions in a free form computer helpdesk data set” includes a method and structure for discovering problem resolution in a helpdesk data set of problem tickets based on using an enumerated set of phrases that have been identified as indicating diagnosis, instruction, or corrective action. Disadvantages of the existing approaches include the use of helpdesk ticketing data consisting of short text descriptions of telephone calls with customers, and as such, many, if not most, of the problem tickets provide little or no problem resolution information. Existing approaches identify tickets by matching their content to specific words or words combinations (indicative of problem diagnosis and resolution) through word-/phrase-based heuristic rules that have been manually generated by specialist inspection of the TT.
Existing approaches primarily focus on classical data mining techniques, such as, for example, clustering based on frequency of words and discovery based on keywords in the semi-structured data. However, most of the existing ticketing data is unstructured, highly noisy, and very heterogeneous in content (that is, natural language, system generated data, domain specific terminology, etc.), making it difficult to effectively apply common data mining techniques used in the existing approaches to analyze the raw ticketing data.
It would thus be desirable to overcome the limitations in previous free form data structuring approaches.