A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates generally to systems for processing information and conveying audio messages and more particularly to systems using speech and non-speech audio streams to produce audio messages.
2. Background
Technology is rapidly progressing to permit convenient access to an abundance of personalized information at any time and from any place. xe2x80x9cPersonalized informationxe2x80x9d is information that is targeted for or relevant to an individual or defined group rather than generally to the public at large. There are a plethora of sources for personalized information, such as the World Wide Web, telephones, personal organizers (PDA""s), pagers, desktop computers, laptop computers and numerous wireless devices. Audio information systems may be used to convey this information to a user, i.e. listener of the message, as a personalized information message.
At times a user may specifically request and retrieve the personalized information. Additionally, the system may proactively contact the user to deliver certain information, for example by sending the user an email message, a page, an SMS message on the cell phone, etc.
Previous information systems that provided such personalized information require that a user view the information and physically manipulate controls to interact with the system. Recently an increasing number of information systems are no longer limited to visual displays, e.g. computer screens, and physical input devices, e.g. keyboards. Current advances in the systems use audio to communicate information to and from a user of the system.
The audio enhanced systems are desirable because the user""s hands may be free to perform other activities and the user""s sight is undisturbed. Usually, the users of these information devices obtain personal information while xe2x80x9con-the-goxe2x80x9d and/or while simultaneously performing other tasks. Given the current busy and mobile environment of many users, it is important for these devices to convey information in a quick and concise manner.
Heterogeneous information systems, e.g. unified messaging systems, deliver various types of content to a user. For example, this content may be a message from another person, e.g. e-mail message, telephone message, etc.; a calendar item; a news flash; a PIM functionality entry, e.g. to-do item, a contact name, etc.; a stock, traffic or weather report; or any other communicated information. Because of the variety of information types being delivered, it is often desirable for these systems to inform the user of the context of the information in order for the user to clearly comprehend what is being communicated. There are many characteristics of the content that are useful for the user to understand, such as information type, the urgency and/or relevance of the information, the originator of the information, and the like. In audio-only interfaces, this preparation is especially important. The user may become confused without knowledge as to the kind of content that is being delivered.
Visual user interfaces indicate information type through icons or through screen location. We call this context indication and the icon/screen location the context identifier. However, if only audio is used to convey information other context indicators must be used. The audio cues may be in the form of speech, e.g. voice, or non-speech sounds. Some examples of non-speech audio are bells, tones, nature sounds, music, etc.
Some prior audio information systems denote the context of the information by playing a non-speech sound before conveying the content. The auditory cues provided by the sequential playing systems permit a user to listen to the content immediately or decide to wait for a later time. These systems are problematic in that they are inconvenient for the user and waste time. The user must first focus on the context cue and then listen for the information.
Moreover, many of these systems further extend the time in which the user must attend to the system by including a delay, e.g. 3 to 20 seconds latency, between the delivering the notification and transmitting the content. In fact, some systems require the user to interact with the system after playing the preface in order to activate the playing of content. Thus, these interactive cueing systems distract the user from performing other tasks in parallel.
In general, people have the ability to discern more than one audio streams at a time and extract meaning from the various streams. For example, the xe2x80x9ccocktail party effect,xe2x80x9d is the capacity of a person to simultaneously participate in more than one distinct stream of audio. Thus, a person is able to focus on one channel of speech and overhear and extract meaning from another channel of speech. See xe2x80x9cThe Cocktail Party Effect in Auditory Interfaces: A Study of Simultaneous Presentationxe2x80x9d Lisa J. Stifelman, MIT Media Laboratory Technical Report, September 1994. However, this capability has not yet been leveraged in prior information systems using speech and non-speech.
In general, the shortcomings of the currently available audio information systems include lengthy and inefficient conveying of cue signals and information. In particular, previous audio information systems do not minimize interaction times.