The present invention relates to the field of natural language speech recognition. More particularly, the present invention relates to the field of recognizing telephone numbers and other spoken information embedded in natural language voice messages stored in a voice messaging system.
Natural language is written or spoken language in a form that a person would use instinctively, as when communicating with another person. Natural language recognition systems are known which permit a user to interface with a computer system using natural language. The natural language recognition system receives spoken input from the user, interprets the input, and then translates it into a form that the computer system understands.
For example, a voice recognition application program is available from Dragon Systems, Inc., which enables a user to enter text into a written document by speaking the words to be entered into a microphone attached to the user""s computer system. The application interprets the spoken words and translates them into typographical characters which then appear in the written document displayed on the user""s computer screen.
Natural language recognition systems are known which provide a telephonic interface between a caller and a customer service application. For example, the caller may obtain information regarding flight availability and pricing for a particular airline and may purchase tickets utilizing natural spoken language and without requiring service from an airline reservations clerk.
Telephone voice messaging systems are known which enable a caller to leave a voice message for a system subscriber who is temporarily unavailable to take the call. Often, the voice message conveys specific and important information required by the subscriber along with other information. As a first example, a voice message may contain a personal greeting from caller, the reason for the call, and a telephone number at which the caller can be reached. As a second example, a voice message may contain a personal greeting from the caller, a reminder that a meeting is to take place, an agenda for the meeting, a time of day for the meeting and a street address for the meeting. In each of these examples, the recipient is likely to immediately assimilate all of the information conveyed by the voice messages, except that, in the first example, the recipient is likely to need to write down the telephone number of the caller and, in the second example, the recipient is likely to need to write down the time of day and the street address. Transcribing this information can be inconvenient and time consuming for the recipient, especially if the recipient needs to rewind or replay the message repeatedly to accurately transcribe information from the voice message.
Therefore, what is needed is a technique for automatically identifying and interpreting or outputting telephone numbers and other information embedded in a voice messages stored in a voice messaging system.
The invention is a method and apparatus for recognizing telephone numbers and other information embedded in voice messages stored in a telephone voice messaging system. A voice recognition system in accordance with the present invention is coupled to the telephone voice messaging system. A voice message stored in the voice messaging system is transferred from the voice messaging system to the voice recognition system. Alternately, the voice message can be provided to the voice messaging system while the message is being recorded. The voice recognition system identifies potential speech utterances in the voice message and segments the voice message into a plurality of such utterances. Alternately, the voice recognition system 100 acts on the message as a whole. The voice recognition system then searches the segments or the message as a whole for a predetermined speech reference model which is expected to contain information of importance to the recipient of the message. This reference model is called a grammar. In the preferred embodiment, a predetermined grammar is a numeric grammar which specifies a sequence of numbers occurring in the voice message which correspond to a telephone number. In alternate embodiments, the grammar can specify a date, a time, an address, a person""s name and so forth. According to an aspect of the present invention, the grammar can be modified or selected by the recipient of the voice message so that the voice recognition system searches for information of particular interest to the recipient.
Once the predetermined grammar is identified, the voice recognition system outputs a portion of the stored voice message which includes the grammar. The output can be a display of the information contained in the grammar, such as a telephone number or an address. Alternately, the output can be an audible replay of just the portion of the stored voice message which includes the telephone number or address.
Important information included in the stored message may be accompanied by an indicator that important information is to follow or that important information was just given. For example, the voice message may contain a telephone number. In which case, the caller is likely to have preceded the telephone number, by saying, xe2x80x9cyou can reach me atxe2x80x9d, xe2x80x9cmy phone number isxe2x80x9d, xe2x80x9ccall me atxe2x80x9d, xe2x80x9cmy extension isxe2x80x9d, or the like. Accordingly, the indicator can be one of these phrases. As another example, the voice message may contain an address which is important. In which case, the caller is likely to have preceded a street number and street name by saying, xe2x80x9cthe address isxe2x80x9d. Alternately, the speaker may have followed a street number and street name by saying xe2x80x9cstreetxe2x80x9d, xe2x80x9croadxe2x80x9d, xe2x80x9cavenuexe2x80x9d, xe2x80x9clanexe2x80x9d, or the like. Accordingly, the indicator can be one of these phrases.
In accordance with the present invention, the predetermined grammar can specify an indicator of important information along with the important information. In which case, the voice recognition system searches the voice message for the indicator in conjunction with the important information. Alternately, separate grammars are specified for the indicator and the important information. In which case, the voice recognition system first searches the voice message for the predetermined indicator. Assuming a predetermined indicator is identified, the voice recognition system then searches for the important information in the vicinity of the indicator. Thus, assuming the indicator is xe2x80x9cmy phone number isxe2x80x9d, then a numeric grammar which follows the indicator is expected to include a string of numbers, such as xe2x80x9c555-1234xe2x80x9d. As another example, assuming the indicator is xe2x80x9croadxe2x80x9d, then a grammar which preceded the indicator is expected to include a number and a street name, such as xe2x80x9c1380 Willowxe2x80x9d. Finally, the recognized grammars are outputted in a form that the recipient of the message can readily utilize. For example, the telephone number, xe2x80x9c555-1234xe2x80x9d or the address xe2x80x9c1380 Willowxe2x80x9d can be displayed or audibly reproduced for the user. In addition, the indicator can be combined with the important information. Thus, the entire utterance, xe2x80x9cmy phone number is 555-1234xe2x80x9d, or xe2x80x9c1380 Willow Roadxe2x80x9d can be provided to the recipient. Use of an indicator of the important information is expected to increase the recognition accuracy of the invention.
The invention provides an advantage to users of voice mail and other audio messaging systems by recognizing telephone numbers and other information embedded in a stored voice message and providing this information in a form which is readily accessible to the user.