The exemplary embodiment relates to the linguistic processing arts. It finds particular application in conjunction with automated natural language processing for use in searching a structured knowledge base, such as a manual, and will be described with particular reference thereto. However, it is to be appreciated that it is also amenable to other like applications.
Many of the devices used today, both within the workplace and outside it, are highly complex. Such devices include computing devices, such as personal computers, image reproduction devices, such as printers and photocopiers, vehicles, and audio and visual equipment, such as cameras, video recorders, cell phones, dictation devices, music systems, and the like. As computing power has increased, so the functionality of these devices has also increased. The added functionality is wasted, however, if users do not know how to use or maintain the device or are unable to locate the necessary information to do so. Suppliers often assist the user by providing various manuals covering, for example, instructions for use, trouble-shooting, and repair. As befits such complex devices, the associated manuals are also by necessity highly complex. Users generally do not need or wish to become familiar with the entire manual, but rather prefer to use it to address specific needs as they arise.
Online manuals offer the opportunity for applying greater search flexibility. However current search mechanisms can often be difficult to use, returning many results which may be irrelevant or missing relevant results. Because manuals and similar searchable electronic knowledge bases tend to use words which are not in common usage, they are difficult to search using conventional searching techniques. Without a good knowledge of the knowledge base content and terminology by the user, searchable knowledge bases often fail to produce effective search results and thus are best suited to experts who are relatively familiar with their content or structure. Manuals also tend to contain common expressions that are repeated in many different contexts. Because current search mechanisms do not factor out recurring expressions, this multiplies the number of results that a user must read through in order to find the most relevant one. Expert system approaches offer more guidance to less experienced users but can be quite rigid and do not offer the flexibility that a more expert user would prefer. Some systems offer a combination of these approaches in order to satisfy the two types of users. However expert system solutions are very expensive to build and maintain.
Decision trees can be used to provide customers with help for the diagnosis of printer systems. This approach specifies the possible troubleshooting sequences as branches of a decision tree. At each branching of the tree, one of the branches will be chosen based on the information provided by the customer at the last step. However, building and maintaining a decision tree that allows for all possible diagnoses is extremely complex, and such a tree can be time-consuming for the customer to navigate.
Internet search engines for general use sometimes use a hierarchy of predefined topics according to which all indexed documents are classified. When combined with a full-text search capability, these topic hierarchies can serve as a query refinement mechanism. When a user's query matches documents in multiple categories, the user may be asked to choose a category before being shown a list of documents. This requires the construction and maintenance of a topic hierarchy, and links from documents into this hierarchy. Documents which a user may consider relevant are often not retrieved because the relevant text may be considered tangential to the main topic of the document. Other search engines cluster the results of the initial search using term occurrence frequencies, and then, for each cluster, present a term representative of that cluster as a refinement choice. This approach relies on the assumption that the document that corresponds to the user's information need is similar, in terms of vocabulary used, to a recognizable class of other documents in the knowledge base. Both of these systems are based on hierarchical classification of documents by topic. The navigable paths, in such systems, have no particular meaning because each selection simply names a smaller, more specific topic than its parent selection.
Another search system is able to extract the most important words of a document and to build an underlying graph representing the number of co-occurrences of these words in the same sentence. This graph is then displayed as a navigation tree where clicking on a branch selects the sentences containing the list of words present in the branch. The selections which may be made are limited to words or expressions identified by their frequency. Words with similar meaning are not considered, nor are the syntactic relationships between words. As a result, relevant documents are often missed and documents which are not particularly relevant may be retrieved because the words they use are very common.