This invention relates to methods for rendering email messages as voice or speech sound. This invention further relates to the field of network-based computer systems that render email and/or Internet web page content as voice, such systems being embodied in various forms including, but not limited to, voice command platforms, internet-based virtual assistants and interactive voice response systems. The invention can also be embodied as an improvement to an email client application.
Email has emerged over the last decade or so as a convenient and extremely widely used medium for communication. Email messages can be created and displayed on computing devices having an appropriate email application, such as Microsoft Outlook. Email applications are available for desk-top computing devices as well as for portable computing devices, e.g., personal digital assistants, so called “hand-held computers”, lap-top computers and web-equipped cellular telephones. Thanks to advances in satellite, RF, and other wireless communications technologies, it is now possible to both send and receive email messages on portable computing devices virtually anywhere in the continental Unites States.
As the capability for sending and receiving email message has migrated onto smaller, more lightweight portable devices, such as cellular telephones, the technology has emerged to render email content as voice. Systems for rendering email content as voice are now described in the patent literature. See, for example, Cooper et al., U.S. Pat. No. 6,466,654. This patent describes a network-based server that functions as a “virtual assistant” system. The system includes a virtual assistant server built on a Windows NT telephony server platform that includes a human interface that may be a voice user interface. The virtual assistant server allows a user to use a voice interactive device, such as a telephone, to access and update information, including voice messages, email messages, intranet or internet content, perform scheduling tasks, and still other functions. The entire content of the '654 patent is incorporated by reference herein.
The Virtual Assistant in the '654 patent includes speech recognition software for recognizing speech input from the user and a text to speech converter for rendering text information (such as text from a web document or an email message) into speech, thereby allowing a user to access their email and have it read to them instead of viewing it on a display.
Other patents of interest include U.S. Pat. No. 6,539,359, which is directed to a system that allows a user to access a network communication node that includes a voice response system having a text to speech converter and speech recognition engine. The user accesses the communication node from a variety of communication devices, including telephones. U.S. Pat. No. 6,115,686 is directed to a text to speech converter that converts documents in a Hyper Text Markup Language (HTML) format to speech. The '686 patent recognizes that most of the electronic texts available from the World Wide Web are formatted according to the HTML standard. Unlike other electronic texts, HTML “source” documents, from which content text is displayed, contain embedded textual tags. Generating speech from an electronic originating text intended for visual display presents certain challenges for the TTS converter designers. For example, information is present not only from the content of the text itself but also from the manner in which the text is presented, i.e., by capitalization, bolding, italics, listing, etc. Formatting and typesetting codes of a text normally cannot be pronounced. Punctuation marks, which themselves are not spoken, provide information regarding the text. In addition, the pronunciation of text strings, i.e., sequences of one or more characters, is subject to the context in which text is used. The '686 patent attempts to solve this problem and provide a better user experience in rendering Web content as speech. Lee et al., U.S. Pat. No. 6,661,877 is directed to a system and method for providing access to a unified message store logically storing computer telephony messages, and is further provided to illustrate the current state of the art. The entire content of U.S. Pat. Nos. 6,661,877, 6,539,359 and 6,115,686 is incorporated by reference herein.
The present inventors have appreciated that the user's experience when receiving or responding to email messages as voice is less than optimal when prior art approaches are used, particularly in comparison to the user experience when the email messages are represented graphically as text on a display. The present invention provides techniques and methods for improving the user experience, through the use of tags (e.g., newly defined XML tags) or other analogous software devices that are inserted into the email content. The tags can be inserted by a client email application that generates the email message, or more preferably, by an email server that receives and stores the email. The tags are then detected by the system rendering email content as speech. Numerous ways in which the tags can be used to benefit the user experience are described in further detail below in the detailed description of presently preferred embodiments of the invention.