Computer systems having databases are generally used to store and obtain information. These computer systems may be stand-alone computers that serve one or more users, or the systems may be networked to provide access to a database from multiple systems referred to as clients. These database systems carry out communication with clients through one or more communication protocols, as is known in the art. For example, these database and clients may communicate over a network, such as the Internet, using the well-known TCP/IP protocol. The client systems may interact with the database using one or more application programs such as an Internet browser which accepts input from a user and display information received from database systems. The database may be, for example, a relational database which stores character, binary, or other data format that may be searched and retrieved.
Database access has become standard for supporting operations performed on networks such as the Internet. For example, databases are used to support searching, by providing a resource which stores links to data resources and to other databases. For example, databases are used to classify and store links, called Uniform Resource Locators (URLs) that serve as addresses to resources such as Web sites, audio/video files, and other types of media. These addresses are provided by database systems in response to queries submitted to the database systems through interfaces to the database systems displayed in browser applications. These database systems are sometimes referred to as search engines, and may be used as a part of a directory service, company's Web site, or any other method for searching and retrieving information.
Interfaces for databases generally include a text entry field, wherein a user may enter one or more keywords associated with what he or she is searching for. These keywords are processed by the database system, and a set of results is displayed to the user. The user then reviews these results to determine how pertinent the results are to what he or she is looking for. These databases function as a directory services for resources on the Internet. Examples of such searching systems include Yahoo!, Google, and others. Yahoo! and other search engines generally provide two methods for finding information. First, a directory method, which provides pathways for navigating through content related by a logical relationship. The second method is a key word search. Some services such as Yahoo! utilize the catalog information provided by other services such as Google for performing the search.
One main problem with existing systems that use the directory search method is that a user must navigate hierarchical directory structures. Taking a “wrong” or a non-ideal “turn” in the search path, by selecting a branch in the search which leads away from the best result, will significantly degrade the final outcome of the user's search.
Keyword search systems generally accept keyword entry and display a number of results, the results being ordered based upon the frequency of the keyword appearing in the resource, or some other ranking criteria. The database systems perform preprocessing on the resources by indexing data of the resources by keywords. This involves analyzing Internet resources with programs referred to as Web spiders or crawlers which visit Internet links and perform keyword processing on resource content associated with the link, generally involving millions of processed resources. Furthermore, a perplexingly large number of search results is typically returned by such keyword search engines. Thousands of search results are usually presented to the user, and the sorting of the search results can involve errors due to the automated indexing of the search results or the difficulties described above.
Further, more simple keyword association and relevancy also produce irrelevant results sometimes. When documents are retrieved based upon a keyword search, resources such as documents having those combination of keywords are retrieved, even though the documents' content may not be relevant. A user must evaluate individually each link to determine whether an indicated document is relevant. Also, the number of records produced is generally large, and a query retrieving thousands of records is not uncommon. Retrieving large number of records is problematic to a user, as the process of reviewing each link is tedious and time-consuming for the user and requires excessive computational resources.
Keywords and other natural language (NL) inputs are generally processed by the system as shown in FIG. 1, which shows a conventional NL searching system 10. Natural language searches generally begin with a user 100 entering an unstructured query 102 into an interface associated with the database search system. A NL query preprocessor 104 processes the unstructured query 102 to determine the meaning of the query. This meaning is formulated into predetermined search criteria 106, which are provided to a query keyword parser 108, which associates meaning for each of the keywords, and may expand the query by generating similar terms for one or more keywords. The unstructured query 102 is also passed directly to the query keyword parser 108 that processes the unstructured query 102 to determine keywords and logical operators 110 connecting those keywords. For example, an input query of “map and Massachusetts” might produce, by the query keyword parser 108, the keywords “map” and “Massachusetts” with a logical operator “and.” The NL query preprocessor 104 may also determine that the user wants driving directions for Massachusetts, or to retrieve maps of major metropolitan areas, based on the meaning of the phrase “map and Massachusetts.”
A database 150 is indexed by keywords in this case, and those keywords and logical operators are compared to that of the keyword index to produce a (typically large) number of search results 118. These search results 118 are presented to the user 100 by a query result presenter 120, within a graphical user interface, and are generally ranked by relating the keywords to the database entries.
As discussed above, search engines may include a NL query preprocessor 104 which attributes some meaning to the terms. For instance, this may be performed through analyzing lexical semantics which determines the meanings of each of the keywords, and by analyzing compositional semantics which is the knowledge of how keywords are combined to form larger meanings. In general, morphology is the study of the meaningful components of words, while syntax is the study of the relationship between words. There are also many other ways to analyze natural language. For example, semantics is the study of meaning, pragmatics is the study of how language is used to accomplish goals, and discourse is the study of linguistic units larger than a single utterance.
Because meaning may be attributed to a query at many levels, NL processing is a complex process which involves complex algorithms. Further, these algorithms are not perfect; there are frequent ambiguities in natural language interpretation. Because of these ambiguities, and because of the inherently subjective nature of database queries, NL processing of input queries yields imperfect search results. Natural language processing is more fully described in the book entitled “Natural Language Understanding” by James Allen, 2nd edition (January 1995), Addison-Wesley Publishing Co., which is hereby incorporated by reference.
As discussed, there are many drawbacks of implementing NL in association with database searching. For example, the user may pose a question, and the question is not interpreted properly, yielding incorrect results. The user may need to restructure the question in a different manner to obtain meaningful results.
There are sites that implement NL analysis such as the portal AskJeeves, which ascribes meaning to input queries by matching a user's question to a question that was previously defined. This portal allows a user to pose questions in a NL format, and retrieves the most relevant question based on a keyword analysis. However, as discussed above, natural language analysis produces ambiguous results and is complicated to perform. Thus, AskJeeves, and others, do not generally perform a perfect match. Further, questions posed to the system by a user do not necessarily have a corresponding question predefined in the system. Also, sample questions presented to the user in response to a query are usually not relevant. Because AskJeeves is linked to a keyword indexed database, the results returned must be processed by the user, and the AskJeeves system produces the same volume of information as standard keyword matching search engines.
Many database search engines also, in addition to keyword-based and NL-based search functions, provide a hierarchical listing of information to compliment these functions. This hierarchical listing is a categorization of links, usually programmed manually and take the form of directories. When new links are added, they are generally placed within the predetermined hierarchy or directory tree. As described earlier, navigating through a directory tree requires a user to accurately choose the best choice from a plurality of presented choices. The presented choices may themselves not include a choice corresponding to a path leading to the information the user actually desires to find. Making a non-ideal selection or being presented with selections none of which are ideal, forces a user down a search path that will not lead to the desired results. Also, excessively long search paths involving many user selections are generally required to reach the end point of a search. No logical relationship necessarily exists between members of a directory level or members of different directory levels. These difficulties cause directory-based search engines or navigators to be an inefficient means for retrieving information.