This invention relates to the field of interpreting natural language. More particularly, this invention relates to a method and apparatus for processing and interpreting natural language which enhances the operation through the use of semantic confidence values to enhance efficiency.
The following definitions may be helpful in understanding the background of the invention as it relates to the invention and the discussion outlined below.
Confidence: a measure of a degree of certainty that a system has accurately identified input language. In the preferred embodiment, it is a measure of the degree of perceived acoustic similarity between input speech and an acoustic model of the speech.
Phrase: a sequence of words.
Example: xe2x80x9cfrom Bostonxe2x80x9d
Grammar rule: a specification of a set of phrases, plus meaning of those phrases
Example: (from [(boston ? massachusetts)(dallas ? texas)]
Generates: xe2x80x9cfrom bostonxe2x80x9d, xe2x80x9cfrom boston Massachusettsxe2x80x9d, xe2x80x9cfrom dallasxe2x80x9d, xe2x80x9cfrom dallas texasxe2x80x9d
Grammar: a set of grammar rules.
Edge: a match located by a parser of a grammar rule against a phrase contained in an input sentence.
Example: From the sentence xe2x80x9cI want to fly from Boston to Dallas,xe2x80x9d a parser could create an edge for the phrase xe2x80x9cfrom Bostonxe2x80x9d using the grammar rule shown above.
Slot: a predetermined unit of information identified by a natural language interpreter from a portion of the natural language input. For example, from the phrase xe2x80x9cfrom Bostonxe2x80x9d the natural language interpreter might determine that the xe2x80x9coriginxe2x80x9d slot is to be filled with the value xe2x80x9cBOSxe2x80x9d (the international airport code for Boston).
Parse tree: a set of edges used in constructing a meaning for an entire sentence.
Example:
Natural language interpreters are well known and used for a variety of applications. One common use is for an automated telephone system. It will be apparent to those of ordinary skill in the art that these techniques can and have been applied to a variety of other uses. For example, one could use such a system to purchase travel tickets, to arrange hotel reservations, to trade stock, to find a telephone number or extension, among many other useful applications.
As an example, consider a system for use in providing information about commercial passenger air flights. A caller to the system might say xe2x80x9cI want to fly from Boston to San Francisco, tomorrow.xe2x80x9d This exemplary system requires three pieces of information to provide information about relevant air flights including the origin city, the destination city and the time of travel. Other systems could require more or less information to complete these tasks depending upon the goals of the system. While the exemplary system also uses a speech recognizer to understand the supplied spoken natural language, it could also receive the natural language via other means such as from typed input, or using handwriting recognition.
Using a predetermined grammar with a set of grammar rules, such a system parses the sentence into edges. Each edge represents a particular needed piece or set of information. The sentence can be represented by a parse tree as shown in the definitions above.
In a parsing operation, the system performs the parsing operation by matching grammar rules to the natural language input. For example, one grammar rule that can specify than an origin expression is the word xe2x80x9cfromxe2x80x9d or the phrase xe2x80x9cout ofxe2x80x9d followed by a city name. If the natural language input is xe2x80x9cI want to fly from Boston to Dallas:, the system will locate the phrase xe2x80x9cfrom Bostonxe2x80x9d and create a record in its internal data structures that these words match the origin expression grammar rules. This record is sometimes referred to as an edge. Systems look for predetermined grammars within collections of natural language. The system performs the parsing operation in accordance with the grammar as a way of forming/filling the desired edges with information from a natural language input. For example, the natural language interpreter identifies the initial city by seeking any of several origin city words such as  less than xe2x80x98fromxe2x80x99, xe2x80x98startingxe2x80x99, xe2x80x98leavingxe2x80x99, xe2x80x98beginningxe2x80x99, . . .  greater than related to a city name from a list of cities. If the natural language interpreter finds an origin city and a city from the list, it will then fill the origin city edge. Similarly, the natural language interpreter identifies the destination city by seeking any of several destination city words such as  less than xe2x80x98toxe2x80x99, xe2x80x98endingxe2x80x99, xe2x80x98arrivingxe2x80x99, xe2x80x98finishingxe2x80x99, . . .  greater than related to a city name from the list of cities. If the natural language interpreter finds a destination city and a predefined city, it will then fill the destination city edge. The grammar for the natural language interpreter similarly identifies the desired time of the flight by seeking any of several time words such as  less than xe2x80x98o""clockxe2x80x99, xe2x80x98morningxe2x80x99, xe2x80x98afternoonxe2x80x99, xe2x80x98a.m.xe2x80x99, xe2x80x98p.m.xe2x80x99, xe2x80x98Januaryxe2x80x99, xe2x80x98Februaryxe2x80x99, . . . , xe2x80x98Mondayxe2x80x99, xe2x80x98Tuesdayxe2x80x99, . . .  greater than related to a number. Using this technique, the natural language interpreter can interpret spoken utterances if they contain the requisite information, regardless of the ordering of the sentence. Thus, the sentence listed above as xe2x80x9cI want to fly from Boston to San Francisco, tomorrow,xe2x80x9d will provide the same result as the sentence, xe2x80x9cPlease book a flight to San Francisco, for flying tomorrow, from Boston.xe2x80x9d
If the natural language interpreter is unable to identify the appropriate words, or a related city name, then the parsing will be terminated as unsuccessful. For example, if the caller says, xe2x80x9cI want to fly to visit my mother,xe2x80x9d the parsing will be unsuccessful. There is no source city word nor source city in the sentence. Further, even though the natural language interpreter finds a destination city word, it cannot find a city name that it recognizes.
For a natural language interpreter used in conjunction with a speech recognition system, the natural language interpreter is provided the speech recognizer""s best determination of each word resulting from the recognition operation. A speech recognizer xe2x80x98listensxe2x80x99 to a user""s spoken words, determines what those words are and presents those words in a machine format to the natural language interpreter. As part of the recognition operation, each word is provided a word confidence score which represents the confidence associated with each such word that the speech recognizer has for the accuracy of its recognition. Thus, it is generally considered useful to take into account the accent or speech patterns of a wide variety of users. A score is generated and associated with each word in the recognition step. Using the scores for each individual word is not entirely satisfactory because that collection of scores does not relate to the meaning the speaker intends to convey. If a single word has a very low word confidence score, the user may be required to re-enter the request.
In one prior approach, the scores for each of the words are combined into a single composite confidence score for the entire sentence. While this approach solves certain problems associated with using the scores for each word and provides a workable solution, it suffers from several drawbacks.
The composite confidence score described above is weighted by all the words in the entire sentence. In a long sentence, a speaker might use many words that are in essence unrelated to providing the information that the natural language interpreter needs. For example, if the speaker says, xe2x80x9cPlease help me to arrange a flight tomorrow to visit my friend for their birthday celebration leaving from Minneapolis and arriving in Cleveland.xe2x80x9d In this example, assume that the speaker talks clearly, so that almost every word has a very high confidence score. A loud background noise occurs during the speaking of the words xe2x80x9cMinneapolisxe2x80x9d and xe2x80x9cClevelandxe2x80x9d so that the confidence score for those two words is low. In fact the speech recognizer incorrectly recognizes one of the words. Nevertheless, because the composite confidence score is high for the entire sentence, the recognition for the sentence is accepted. Thus, the natural language interpreter instructs the system to find the wrong flight information.
On the other hand, even if the critical information is all properly recognized, if the composite confidence score is low, the entire sentence is rejected. For example, the speaker says, xe2x80x9cI want to fly tomorrow from Chicago to Phoenix.xe2x80x9d In this example also assume that the speaker talks clearly. The speech recognizer properly identifies the words, xe2x80x9ctomorrow from Chicago to Phoenix.xe2x80x9d A loud background noise occurs during the speaking of the words, xe2x80x9cI want to fly.xe2x80x9d Those words have a low confidence score. Thus, even though the critical information is accurately recognized this sentence is rejected because the composite confidence score for the sentence is low. Because of the operation of prior systems, the information is rejected and the user is required to again provide all the information. This is inconvenient for the user.
As can be seen, use of a composite confidence score for the entire sentence results in an operation of the system which is not optimal. Under one set of conditions described above, the system will attempt to utilize incorrect information in carrying out its task. Under the other set of conditions described, the system will require the user to re-provide all of the information, i.e., repeat the entire sentence. One scenario provides an incorrect result, the other is inconvenient to the user.
What is needed is a natural language interpreter for use in conjunction with a speech recognizer which provides more accurate results. What is further needed is a natural language interpreter for use in conjunction with a speech recognizer which does not require a user to re-enter information that was correctly received.
According to the present invention, a stream of input speech is coupled as an input to a speech recognizer. The speech can be provided to the speech recognizer directly from a user or first stored and provided from a memory circuit. Each input word is recognized by the speech recognizer and a word confidence score is associated to with each corresponding recognized word. The recognized words and their associated word confidence scores are provided to a natural language interpreter which parses the stream of recognized words into predetermined edges. From the edges, the natural language interpreter forms semantic slots which represent a semantic meaning. A slot confidence score is determined for each slot. The slot confidence score is used as a basis for determining the confidence to be placed into a particular predetermined semantic meaning. Based upon the slot confidence score, an ancillary application program determines whether to accept the words used to fill each slot. If the slot is rejected, the application program can request the user to repeat the information necessary to fill that slot only, rather than requiring the user to repeat the entire stream of input speech.
While the invention is described in terms of edges and slot confidence scores, it will be apparent to one of ordinary skill in the art that the essence of using slots is to provide a confidence score for a particular meaning. There are a variety of ways that this can be achieved. Nevertheless, the remainder of this patent document will discuss the invention in terms of slot confidence scores. It will be understood that this could be interpreted to mean any semantic meaning.