When playing back recorded speech or video signals, it is often desirable to speed up the playback for efficiency. However, not all messages can be sped up to the same degree while retaining intelligibility. Controls for speeding up the playback are generally set manually, and must be adjusted for each message. Also, sometimes a listener of the message needs to slow down the message in order to understand part or all of its contents. This typically occurs when the speaker has an accent as perceived by the listener, or when listeners require additional time to understand the concepts being presented. If the message is to be played back to a number of listeners, it is highly unlikely that each listener will want or need to speed up or slow down the playback speed of the original recording by the same amount. For example, if a speaker who is from India records a message, a listener from China may not be able to intelligibly understand everything the speaker says at the same speed as a listener from Spain would be able to. Each listener is typically required to manually set the playback speed according to his/her preferences and comprehension rate. The listener from China may manually set the recording to a first playback speed in order to intelligibly understand the contents of the message. Conversely, the listener from Spain may manually set the recording to a second different playback speed in order to intelligibly understand the message.
Often two individuals from different regions of the world attempting to communicate each party perceive the other party to have an accent. Sometimes the perceived accent does not substantially affect the communication between the two parties. However, if the accents do affect the communication between the two parties, it is usually because one party or both parties cannot intelligibly understand what the other party is saying due to his or her perceived accent. This may result in awkward or annoying situations where one participant is continually asking the other participant to slow down when he or she speaks or repeat something that has already been said. On the other hand, one party may be too embarrassed to ask the other party to repeat themselves even when the listening party did not understand everything the speaker said. The listener may never fully understand what the speaker is saying. This outcome could be especially problematic if the speaker was a superior of the listener and the listener was given an order. If the listener was not able to fully comprehend what the order was, and does not ask for the speaker to repeat him/herself, the order may go uncompleted.
A mechanism used to slow down recorded audio signals in order to increase intelligibility is known as Time-Scale Modification (TSM). In typical TSM applications, a listener can manually control the speaking rate during playback of a previously recorded message. This enables the listener to speed up or slow down the articulation rate and, thereby, the information delivery rate provided by the previously recorded message. As is well known to those of ordinary skill in the art, the use of the TSM method enables the sped up or slowed down audio signal to be presented intelligibly at the corresponding increased or decreased playback rate. Thus, for example, a listener can readily comprehend material through which he/she is fast-forwarding.
In typical TSM systems, input from the listener is usually required through the use of key presses, mouse movements, or similar commands in order to specify playback speed. These commands enable a listener to manually adjust the information delivery rate of an audio signal to suit his/her interests and speed of comprehension needs.
A draw back to current TSM systems is that a manual input from a listener is required. As the listener is trying to understand what the speaker is saying, the listener must also concentrate on adjusting speed controls and giving commands to control the playback rate. Furthermore, these TSM systems are typically not used in real-time conversations. Rather, the TSM systems are usually used when a listener is actually listening to a pre-recorded message. Also, as the listener is concentrating on adjusting the speed of the recording, he or she may potentially miss important information.
There have been some attempts to correct these shortcomings, however many of them are incomplete or still leave something to be desired. For example, some systems can monitor the listener's manual input to the playback speed controls. As playback speed is manually adjusted the system “learns” what sort of content the listener is interested in. Based on what the monitoring unit believes the listener is interested in, it will attempt to identify similar content in subsequent parts of the message and will slow them down or speed them up according to how that particular content was treated before. Unfortunately, this particular solution still requires a listener to manually speed up or slow down the message a number of times before the monitoring unit is able to “learn” how content should be treated in the future. Therefore, the monitoring unit is usually only able to make determinations about what portions of the message should be sped up and slowed down only after the user has supplied some-substantial amount of manual feedback. If a listener is struggling to comprehend a particular message or portion of a message but does not try to manually speed up or slow down that portion of the message, the monitoring unit will have a difficult time identifying content that should be sped up or slowed down in the future.
Another problem that has been encountered in the prior art is that accommodations that are available to telecommunication device users with cognitive disabilities or shortcomings are somewhat limited. The Federal government has attempted to regulate telecommunication providers and systems by requiring them to accommodate people with limited cognitive skills. Specifically, the Code of Federal Regulations states that telecommunication system providers should, “Provide at least one mode that minimizes the cognitive, memory, language, and learning skills required of the user.” (36 CFR 1193.41(i)). In order to comply with this provision, many techniques and systems have been explored by telecommunication providers. One strategy that has been employed to assist users with limited cognitive skills is that a user is provided with an “undo” or backup function. The backup function provided to a user allows them to easily correct any mistake that he/she makes during the course of using the telecommunication device/system. More specifically, many times when a customer calls into a contact center an Interactive Voice Response (IVR) unit will prompt the customer to answer various questions. Many times the customer is able to engage a backup key that undoes the last answer entered by the customer. Another technique that is employed by many systems is a menu that repeats itself on a constant basis. This way, if the customer cannot comprehend an entire message the first time it is played to the customer, he/she can wait until the message is replayed in order to understand more of the message.
Unfortunately, the above-noted solutions still leave something to be desired. Certainly it can still become frustrating to a user when he/she must continually hit the backup button every time they make a mistake. It could take a much longer time to complete a task involving an IVR or other prerecorded message if the user has to keep cycling through choices because he/she selected the wrong choice. It can be equally frustrating for a user to listen to the same message four, five, or even six times in order to understand the entire message. Therefore, the systems that are currently in place for users with cognitive disabilities still leave much to be desired.