The present invention pertains to the field of speech, audio, and audio-visual works. In particular, the present invention pertains to method and apparatus for receiving listener input regarding desired speed of playback for portions of a speech, audio, and/or audio-visual work and for developing a xe2x80x9cSpeed Contourxe2x80x9d or xe2x80x9cConceptual Speed Associationxe2x80x9d data structure which represents the listener input. The listener input serves as a proxy for the listener""s interest in, and/or for the listener""s ability to comprehend (and/or transcribe), the speech, audio, and/or audio-visual work and will be referred to herein as xe2x80x9clistener interest.xe2x80x9d For example, the listener might want to slow down some portion of the speech, audio, and/or audio-visual work if the listener was interested in enjoying it more fully, or if the listener was having a hard time comprehending the portion, or if the listener was transcribing information contained in the portion. In further particular, the present invention pertains to method and apparatus for replaying the speech, audio and/or audio-visual work in accordance with the Speed Contour or Conceptual Speed Association data structure to produce a xe2x80x9clistener-interest-filteredxe2x80x9d work (xe2x80x9cLIFxe2x80x9d work). The LIF work is useful in a number of applications such as, for example, education, advertising, news delivery, entertainment, public safety announcements and the like.
Presently known methods for Time-Scale Modification (xe2x80x9cTSMxe2x80x9d) enable digitally recorded audio to be modified so that a perceived articulation rate of spoken passages, i.e., a speaking rate, can be modified dynamically during playback. Typical applications of such TSM methods include, but are not limited to, speed reading for the blind, talking books, digitally recording lectures, slide shows, multimedia presentations and foreign language learning. In a typical such application, referred to herein as a Listener-Directed Time-Scale Modification application (xe2x80x9cLD-TSMxe2x80x9d), a listener can control the speaking rate during playback of a previously recorded speaker. This enables the listener to xe2x80x9cspeed-upxe2x80x9d or xe2x80x9cslow-downxe2x80x9d the articulation rate and, thereby, the information delivery rate provided by the previously recorded speaker. As is well known to those of ordinary skill in the art, the use of the TSM method in the above-described LD-TSM application enables the sped-up or slowed-down speech or audio to be presented intelligibly at the increased or decreased playback rates. Thus, for example, a listener can readily comprehend material through which he/she is fast-forwarding.
In a typical LD-TSM system, input from the listener can be specified in a number of different ways. For example, input can be specified through the use of key presses (button pushes), mouse movements, or voice commands, all of which are referred to below as xe2x80x9ckeypresses.xe2x80x9d As a result, one can readily appreciate that an LD-TSM system enables a listener to adjust the information delivery rate of a digital audio medium to suit his/her interests and speed of comprehension.
As one can readily appreciate from the above, in order to optimize the use of such an LD-TSM system, there is a need for determining how listeners interact with audio media that provide TSM. In particular, the actual information delivery rate selected by a listener depends on diverse factors such as intelligibility of a speaker, listener interest in the subject matter, listener familiarity with the subject matter, whether the listener is transcribing the content, and the general amount of time the listener has allotted for receiving the contents of the material.
Prior art methods for determining listener interest in portions of speech and/or audio are inherently inaccurate. Specifically, these methods involve detecting fast-forward and rewind patterns of, for example, a cassette tape produced by button pushes. The use of such fast-forward or rewind patterns suffers from various drawbacks. For example, the listener often alternates between fast-forwarding and rewinding over a particular piece of audio material because the information is either not presented, or is unintelligible while fast-forwarding or rewinding. In addition, whenever a playback location is advanced, this either interrupts playback while advancing through the audio material or presents unintelligible versions of the audio material (xe2x80x9cchipmunk likexe2x80x9d sounds for speed-up, etc.). As such, current methods of determining listener interest are of little use for determining an optimal information delivery rate.
As one can readily appreciate from the above, a need exists in the art for a method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works. In addition, a need exists in the art for a method and apparatus for replaying speech, audio and/or audio-visual works in accordance with the determination of listener interest to provide a listener-interest-filtered work (xe2x80x9cLIFxe2x80x9d work).
One or more embodiments of the present invention advantageously satisfy one or more of the above-identified needs in the art. In particular, one embodiment of the preset invention is a method for generating a listener-interest-filtered work for an audio or audio-visual work, which method comprises steps of: (a) generating one or more average speed contours for one or more audio or audio-visual works for one or more categories of users; (b) converting the one or more average speed contours to one or more conceptual speed association data structures; and forming a listener-interest-filtered conceptual speed association data structure from the one or more conceptual speed association data structures.