This invention relates to the fields of speech processing and database retrieval.
Previously speech systems were generally unidirectional. Speech recognizers would take input for commands or dictation and produce results otherwise accomplished by buttons or keyboards. Speech synthesizers would simply read text to people and achieve effects otherwise available from screens or printouts.
A speech processor that both speaks and listens uses speech recognizers as well as speech synthesizers to allow a user to engage in what is commonly thought of as a dialog with a data base. According to an embodiment of the invention, an element of working memory holds whatever is the context of the dialog so that the system can respond to successive statements with greater and greater specificity.
In particular, the present invention provides a method of generating content information for output to a user. The method includes the steps of generating first information based on a first statement in a natural language, and generating second information based on a second statement in the natural language and based on a context provided by the first information. The method also includes the step of incorporating content information generated based on the second information into output to the user.
The method can also include the steps of generating a first query based at least in part on the first statement, and querying a database using the first query to thereby generate the first information. Further, the method can include the step of generating at least a first answer in the natural language based on the first information, and generating at least a second query based on the second statement and further based on the context provided by the first statement, the first information, and the first answer.
Moreover, the method can also include the step of querying the database using the second query to thereby generate the second information and generating at least a second answer in the natural language based on the second information. The first and second queries may be in Structured Query Language.
In another aspect of the invention, the second statement is a specific statement relating to the first statement. In addition, the context provided by the first information may comprise a specific phrase included in the first statement.
If desired, the method may also include the steps of generating third information based on a third statement in the natural language and based on a context provided by at least one of the first and second information, and incorporating content information generated based on the third information into the output to the user.
The present invention also provides a method of querying a database. The method includes the steps of receiving a first statement in a natural language, generating grammatical data, and generating at least a first query based on the first statement and the grammatical data. The method also includes the steps of generating first information based on the first query, generating a first answer in a natural language based on the first information, and receiving a second statement in the natural language. The method further includes the step of generating a second query based on the second statement and a context provided by at least one of the first query, the first information, and the first answer.
The method may also comprise the step of generating content for output to the user that includes the first answer. Further, the method may include the steps of generating second information based on the second query, and generating at least a second answer in the natural language based on the second information. It should also be noted that the step of generating the first query can include the step of fuzzy matching the first statement to the grammar.
The present invention further provides a speech recognition system. The system includes an input device configured to receive a first statement in a natural language and a system state controller configured to provide grammatical data to the input device. The input device is further configured to generate a first query based on the first statement and the grammatical data, and a database configured to generate first information based on the first query. The system also includes an output device configured to generate a first answer in the natural language based on the first information. The input device is further configured to receive a second statement in the natural language and configured to generate a second query based on the second statement and a context provided by at least one of the first query, the first information, and the first answer.
The system may also include a memory bank configured to store the first query, the first information and the first answer. The memory can be further configured to store at least one of an antecedent to a pronoun and a disambiguating homonym for the first statement. The system can also comprise a speech recognizer configured to receive the first statement and configured to convert the first statement into a plurality of phonemes and a first language model configured to generate a plurality of parsing tokens based on the plurality of phonemes and the grammatical data. In addition, the system can also include a query generator configured to generate the first query based on the plurality of parsing tokens.
The database can be further configured to generate second information based on the second query, and the output device can also be further configured to generate a second answer in the natural language based on the second information.
The system may also comprise a device controller configured to carry out a command from the system state controller. The system state controller can be further configured to generate the command based on at least one of the first information, the second information, the first answer and the second answer. The device controller may also be further configured to generate content for output to the user that includes at least one of the first answer and the second answer. The system can also include a synthesizer configured to convert the second answer to a voice message.