Nowadays, due to the proliferation of various types of electronic devices of different vendors and configurations, and that the operations of these electronic devices are typically interrelated, finding a solution to an operation problem of a malfunctioning electronic device can become very challenging. As an illustrative example, when a user transmits, from a computer terminal, a printing task to use a network printer, but the printer fails to perform the task, there can be different reasons for the failure. For example, the printer may be malfunctioning, or may have been configured improperly. There can also be a network connectivity issue that prevents the printer from receiving the printing task. In addition, the computer terminal may also be configured improperly (e.g., using a wrong version of printer driver). Considering also that these equipment can be of different vendors and of different configurations, the solution space can become so large that it becomes impractical to use a brute-force approach (e.g., exhausting different combinations of possible solutions) to find a solution to an operation problem of an electronic device.
One approach to solve this problem is to leverage historical operation data of the electronic device, and of other devices that are related to that electronic device. As an illustrative example, a user may experience a certain operation problem with a first electronic device when operating it with a second electronic device. If, in the past, there are a certain number of users who have also operated the two electronic devices in the same way as this user, and experienced the same operation problem, it can be hypothesized that the operation problem with the first electronic device is caused by (or at least is connected to) the second electronic device.
Historical operation data of electronic devices can exist in different places and in different forms. For example, these data can be stored in enterprise service tickets, server logs, etc. These data typically come in two forms: structured data and unstructured data. Structured data can include a set of discrete data that are associated with specific fields which give meaning to the set of discrete data. For example, a service ticket may include fields for inputting a type of electronic device (e.g., printer, laptop, etc.), an operation of the device (e.g., configuration, installation, etc.), etc. Unstructured data, on the other hand, can include data that are associated with a generic field (e.g., description of problem) and is not imparted with a pre-determined structure.
A conventional system typically accumulates these historical operation data (both structured and unstructured), and apply regular association rules, as well as machine learning algorithms like classification, clustering, or regression methods, to look for relationship between operation data of different devices. Based on the relationship, the system may then determine a hypothesis for the cause of an operation problem, as well as the solution based on the hypothesis.
The inventors here have recognized several technical problems with such conventional systems. First, as discussed before, historical data can come in a structured form and an unstructured form. While data in structured form may carry a certain meaning (imparted by the structured field a piece of data is associated with), data in unstructured form can include many hidden information that is difficult to be extracted using regular association rules and machine learning algorithms. As an illustrative example, a convention system may not understand the meaning of a text description of “printer not working, cannot log into TC-300,” nor can it classify and cluster different segments of the text, without the text being imparted with a structure that defines the meaning of each portion of the text.
Second, while natural language processing may provide some insight into how a text description can be interpreted, it becomes difficult to apply such processing to the extent that it generates a meaning for the whole text description, when there is huge volume of unstructured historical data and of different formats.