The present invention relates generally to speech recognition systems as applied to voice and electronic message mailing, and particularly to a system and method for converting speech to a text message suitable for sending as an e-mail message and for viewing on a text display device.
Conventional voice mail systems, for example as disclosed in U.S. Pat. No. 4,640,991, to Mathews et al., and Internet-based voice mail systems, such as OneBox.com, combine telecommunications and computer technologies to enable callers to conveniently create and store voice messages for later receipt by recipients. When a caller calls an intended recipient who is a subscriber to such a system, and the recipient does not answer the telephone, the caller is transferred automatically to the voice mail system. The voice mail system enables the caller to record a message for the subscriber in the caller""s own voice, which the voice mail system stores in electronic, usually digital, form. Many voice mail systems give the caller the opportunity to review, then save, delete or replace the current message. When the recipient calls the voice mail system, the voice mail system notifies the recipient of any stored messages, and enables the recipient to listen to the stored messages. Many voice mail systems enable the recipient to replay, delete or archive messages.
Electronic mail systems, which typically operate on the Internet and other computer networks, provide similar functions, but applied to electronic text messages. To use an electronic mail system, a sender composes a text message, usually at a personal computer, computer terminal or xe2x80x9cmailstation,xe2x80x9d then requests the electronic mail system to send the message to recipients at their electronic mail addresses. In addition to text, the message may include other forms of information, such as graphics, digitized images and voice recordings, either directly as part of the message or as attachments. The sender""s system forwards the messages, with electronic mail addresses attached, to the recipients"" electronic mail systems. Recipients, who may be subscribers to the same electronic mail system or others, connect to the electronic mail systems with personal computers, computer terminals, mailstations, personal digital assistants, wireless phones and other devices capable of viewing electronic mail messages. The electronic mail system notifies the recipient of any stored messages, and enables the recipient to view, delete or archive messages, forward messages to other recipients, or reply to the sender.
Multimedia mail systems also provide similar functions, but for both voice mail and electronic mail (see U.S. Pat. No. 4,972,462 to Shibata), for both voice mail and facsimiles (see U.S. Pat. No. 5,483,580 to Brandman et al., U.S. Pat. No. 5,675,507 to Bobo and U.S. Pat. No. 5,943,400 to Park), and for voice mail, electronic mail and facsimiles, for example OneBox.com, eFax.com, jFax.com, respectively. Existing multimedia systems receive, process, store and provide access to multiple media, but handle each medium separately. These multimedia systems provide recipients with listings that include messages of all types, but do not convert one type of message to another. For example, the aforementioned multimedia systems do not convert voice mail messages or facsimiles to text messages.
U.S. Pat. No. 4,996,707 to O""Malley et al. describes a system that receives facsimiles, uses stored and text-to-speech voice messages to notify remote recipients over the telephone network about the availability of facsimiles, converts facsimile images to characters, and uses text-to-speech to convert those characters to spoken words. Another system, disclosed in U.S. Pat. No. 5,634,084 to Malsheen et al., uses text-to-speech to convert the text of electronic mail messages to spoken words, so the messages can be accessed over the telephone network without the need for additional devices.
In an information processing system disclosed in U.S. Pat. No. 5,479,491 to Garcia et al., speech recognition is used to interpret verbal commands spoken by a caller to access voice mail and other services.
Different media are advantageous in different circumstances. Voice mail messages and voice output from facsimiles and electronic mail messages are convenient because telephones are ubiquitous and inexpensive. Voice also conveys personality and emotion.
However, electronic mail messages can be advantageous. Compared to over-the-telephone voice mail, electronic mail avoids long distance telephone charges, and compared to Internet-voice mail, much less data is transmitted and stored. Furthermore, text messages can be displayed on simple, inexpensive devices such as personal digital assistants, mailstations, pagers, wireless phones and other Internet-connected devices. In addition, electronic mail systems can provide, at very low cost, a record of messages sent and received. Text messages can be searched easily for content whereas voice messages cannot be as easily searched. Text messages can be read by deaf people and by people who have difficulty understanding the same language when spoken. Another advantage is that electronic mail systems provide message directories that can be organized and visually scanned, whereas voice mail systems typically require subscribers to listen to sequential lists.
The accuracy of speech recognition software has improved. Present (circa 2000) continuous speech recognition software offered by such vendors as Nuance, Philips and SpeechWorks accurately recognize tens of thousands of words spoken over the telephone by most any caller, as long as the caller speaks about a specific topic such as trading stocks or ordering airline tickets. Furthermore, continuous speech recognition software offered by such vendors as Dragon Systems, IBM, Lernout and Hauspie, and Philips accurately recognizes dictations about topics as broad as business, healthcare and law. This software works best when users have previously provided voice samples, and when the speech to be recognized is not distorted or mixed with noise. The speech recognition software works with degradation for anyone who speaks clearly, even over telephone networks.
Therefore there is a need for a system and method that uses speech recognition software to automatically convert voice messages into text messages suitable for sending as e-mail messages and for viewing on a display devices. The system and method should provide sufficient accuracy when converting the voice messages, even when voice samples have not been provided.
An audio message from a caller for a recipient is received. An e-mail address for the recipient is determined. A text message file is generated from the audio message from the caller. The text message file is sent to the recipient at the recipient""s e-mail address.
In another embodiment, a voice-to-electronic mail computer system allows a caller to dictate a message, stores the dictated message as a voice message, and, while the caller is dictating the message, uses continuous speech recognition to convert the voice message to text. In one embodiment, the speech recognition software refers to a data structure that stores callers"" speech characteristics. In another embodiment, the speech recognition software refers to a data structure that stores specialized vocabularies. In yet another embodiment, at the caller""s option, the voice-to-electronic mail system uses text-to-speech conversion to read the text for verification. The caller may accept, replace, edit or discard the voice and text messages. Once accepted, the voice-to-electronic mail system uses the information stored about the message, namely, the caller""s name, subject, where and when the caller can be reached, and the dictated text, to create a conventional electronic mail message, which the system forwards through use of an electronic mail system. In an alternate embodiment, the system also sends the caller""s voice message as an attachment to the electronic mail message to allow the recipient to also listen to the original voice message. Using an ordinary electronic mail system and a simple, text display device, the recipient can select messages by sender and subject, and then display them. If the recipient""s display device has audio capability, the recipient may also listen to the attached voice message to verify the text and to hear the caller""s personality and emotion.
In this way, the present invention enables callers to dictate messages that recipients receive and read as text on simple text display devices. Recipients can organize and review voice messages by such categories as sender, subject and time rather than being limited to reviewing the messages in sequential order by time of receipt. Recipients can also readily access information such as time of receipt, and telephone numbers at which the recipient can reach the message senders. Because the voice messages are in text form, the voice messages can be searched for particular content. A record of voice and text messages created through use of an automated message service is provided, by sender, subject and time. In one embodiment, by sending text messages, rather than voice messages, the present invention reduces the amount of data that is transmitted and stored.