1. Field of the Invention
The present invention relates generally to semantic searches and specifically to converting natural language queries into logical queries.
2. Introduction
Many approaches have been used to try to solve the general problem of using natural language to search structured databases or unstructured text with varying levels of success. A potential approach or solution can be broadly divided into two parts: (1) processing the natural language question into a logical query, and (2) mapping the converted query to databases. This application discusses the first part.
Keyword matching and grammar-based natural language processing are some common approaches to addressing the first part of processing the natural language question into a logical query. Each of these two techniques have significant limitations individually. Keyword-based querying is a simple method of matching keywords in the user query to the database entities. Keyword matching may be effective in handling simple questions like “number of customers,” but tends be highly error prone in handling complex questions when understanding proper associations of the different parts of the user query is necessary.
A keyword-based natural language query consists of a simple list of words entered by the user, much like what many people enter as search strings in modern search engines. For example, if a user is searching for the five-day weather forecast in Bermuda, the user may say or enter the text “Bermuda weather”. From the user's point of view, these keyword-based searches may be convenient and do not require strict syntax while entering the query. The query context is in the user's mind and thus would be very difficult, if not impossible, for a natural language processor to understand the meaning and intent of the query. For example, if user enters the keywords “cold fusion”, the system would not know if the user meant ColdFusion the software, the energy generation technique used by the nuclear physicists, or two unrelated keywords “cold” and “fusion”.
Three fundamental problems with a keyword-based approach are (1) the same word could have multiple, different meanings based on the context or domain the user is interested in, (2) the keyword-based approach could result in a huge list of alternative answers leaving the burden of selecting the right answer to the user, and (3) the approach becomes ineffective as the targeted volume of search space words increases.
A grammar-based or language processing approach to dissecting a user query using parts-of-speech, grammars, etc. is also common. However, the success of grammar-based solutions is limited based on dependency on a properly framed question, language ambiguity, and, most importantly, the lack of a grammar or a minimized grammar appropriate to business-speak which is how business users tend to ask questions (or for a particular domain).
A grammar-based approach typically defines a strict syntax for the natural language processor. The rules are defined for convenience of implementation. Users are seldom aware of these rules or the rationale behind them. When a user types the query that exactly matches with the foreordained syntax, the language processor understands the query and possibly some of the relationships among the keywords. These processors do a better job in accurately recognizing the meaning of the query when compared with keyword-based language processors.
However, grammar based processors also have many limitations. First, grammar rules are not known to the end user. For example, users may not be aware that a concept must be followed by unit of time for the grammar rule to work as in “Sales in January”. For some users, an input such as “January Sales” may be more convenient. Second, the grammar rules can become complex as combinations of rules increases. Third, grammar rules focus more on syntax and order than the semantic meanings and relationships. Fourth, grammar-based processors are hard to extend and are unable to find new relationships that the system does not already know about. Fifth, the grammar-based approach is more suitable for implementing a new programming language on a specific hardware platform and is not an effective solution for natural language processing.
Programmatic and rules based approach to parsing natural user query portions is another common approach in addressing some of the challenges of these techniques. For example, developers attempt to envision various forms of natural phrases and try to address them programmatically, writing code for each or more common structures. While this approach may prove reasonably effective with limited phrases, it can become unwieldy very quickly when parsing natural language queries.
Folksonomy is another information retrieval methodology consisting of user generated, open-ended labels that categorize content such as web pages, online photographs, and web links. A folksonomy is most notably contrasted from a taxonomy in that the authors of the labeling system are often the main users (and sometimes originators or experts) of the content to which the labels are applied. The labels are commonly known as tags and the labeling process is called tagging. The process of folksonomic tagging is intended to make a body of information increasingly easier to search, discover, and navigate over time. A well-developed folksonomy is ideally accessible as a shared vocabulary that is both originated by and familiar to its primary users. Two widely cited examples of websites using folksonomic tagging are Flickr and Del.icio.us. Folksonomy, while collaboratively generated, suffers from the same challenges as keyword-based search with the lack of relationship information.
Lastly, an ontology attempts to represent a real-world view of business models, grammars, sentence constructs, or phrases. However, building semantic frameworks quickly becomes time consuming and cost prohibitive as the scope of the application or the domain increases.
Accordingly, what is needed in the art is a way to process a natural language query that can overcome the limitations of a single, rigid approach.