Presently known methods for Time-Scale Modification (“TSM”) enable digitally recorded audio to be modified so that a perceived articulation rate of spoken passages, i.e., a speaking rate, can be modified dynamically during playback. Typical applications of such TSM methods include, but are not limited to, speed reading for the blind, talking books, digitally recording lectures, slide shows, multimedia presentations and foreign language learning. In a typical such application, referred to. herein as a Listener-Directed Time-Scale Modification application (“LD-TSM”), a listener can control the speaking rate during playback of a previously recorded speaker. This enables the listener to “speed-up” or “slow-down” the articulation rate and, thereby, the information delivery rate provided by the previously recorded speaker. As is well known to those of ordinary skill in the art, the use of the TSM method in the above-described LD-TSM application enables the sped-up or slowed-down speech or audio to be presented intelligibly at the increased or decreased playback rates. Thus, for example, a listener can readily comprehend material through which he/she is fast-forwarding.
In a typical LD-TSM system, input from the listener can be specified in a number of different ways. For example, input can be specified through the use of key presses (button pushes), mouse movements, or voice commands, all of which are referred to below as “keypresses.” As a result, one can readily appreciate that an LD-TSM system enables a listener to adjust the information delivery rate of a digital audio medium to suit his/her interests and speed of comprehension.
As one can readily appreciate from the above, in order to optimize the use of such an LD-TSM system, there is a need for determining how listeners interact with audio media that provide TSM. In particular, the actual information delivery rate selected by a listener depends on diverse factors such as intelligibility of a speaker, listener interest in the subject matter, listener familiarity with the subject matter, whether the listener is transcribing the content, and the general amount of time the listener has allotted for receiving the contents of the material.
Prior art methods for determining listener interest in portions of speech and/or audio are inherently inaccurate. Specifically, these methods involve detecting fast-forward and rewind patterns of, for example, a cassette tape produced by button pushes. The use of such fast-forward or rewind patterns suffers from various drawbacks. For example, the listener often alternates between fast-forwarding and rewinding over a particular piece of audio material because the information is either not presented, or is unintelligible while fast-forwarding or rewinding. In addition, whenever a playback location is advanced, this either interrupts playback while advancing through the audio material or presents unintelligible versions of the audio material (“chipmunk like” sounds for speed-up, etc.). As such, current methods of determining listener interest are of little use for determining an optimal information delivery rate.
As one can readily appreciate from the above, a need exists in the art for a method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works. In addition, a need exists in the art for a method and apparatus for replaying speech, audio and/or audio-visual works in accordance with the determination of listener interest to provide a listener-interest-filtered work (“LIF” work).