Users of existing voice mail/telephone answering machines and other equipment with voice messaging capabilities, e.g., cellular phones, are all too familiar with a classical user interface (UI) problem. The classical UI problem is associated with the fact that it is often difficult for a user to effectively extract key information, e.g., phone numbers, addresses, names, etc., from a voice message during playback. This may be due to a variety of reasons.
By way of one example, the provider or author of the message may have a tendency to rush through the portions of the message which he is very familiar with and which are repetitive for him, e.g., his name and phone number. Thus, the person listening to the message is not given enough time to write down the name and phone number of the caller during normal speed playback. One solution is to replay the entire message, which is time consuming and provides no guarantee that the listener will extract all the relevant information the second time or, for that matter, any number of subsequent times. Of course, the user could possibly slow down a subsequent playback of the message if the playback equipment has the capability to do so. However, in existing systems with equipment that is only able to slow down the entire message, subsequent playback is even more time consuming, not to mention quite frustrating. It also requires the user to perform several active physical steps to achieve such a slowed playback. Even if a portion of the played back message can be slowed down, the user must still manually search the message record, by starting and stopping the playback, until he gets to the point that he wishes to slow down.
By way of another example, the person playing back a message may not be free to write down the key information in a message because he may be occupied performing some concurrent operations, e.g., driving his vehicle, holding objects, etc.
Furthermore, it is to be understood that this classical UI problem is not limited to playback of voice messages. That is, such a UI problem exists during a real-time (live) phone conversation. In such a case, the listener still has difficulty remembering and/or transcribing important information from an on-going phone conversation, e.g., when participating in a cellular phone conversation while driving his car. In fact, the problem is made worse since the user does not have a recording of the conversation to which he may later refer to try to obtain any missed information.
It is also to be appreciated that this classical UI problem extends beyond voice or speech signals. That is, the same difficulties exist when trying to extract key information from playback or rendering of multi-modal or multimedia type information signals, e.g., signals including both audio and video information portions, or text document-based or markup language-based signals, e.g., XML documents.
Thus, there is a need for information signal processing methods and apparatus that substantially reduce and/or eliminate this classical UI problem.