The present invention pertains to the field of speech, audio, and audio-visual works. In particular, the present invention pertains to method and apparatus for receiving listener input regarding desired speed of playback for portions of a speech, audio, and/or audio-visual work and for developing a xe2x80x9cSpeed Contourxe2x80x9d or xe2x80x9cConceptual Speed Associationxe2x80x9d data structure which represents the listener input. The listener input serves as a proxy for the listener""s interest in, and/or for the listener""s ability to comprehend (and/or transcribe), the speech, audio, and/or audio-visual work and will be referred to herein as xe2x80x9clistener interest.xe2x80x9d For example, the listener might want to slow down some portion of the speech, audio, and/or audio-visual work if the listener was interested in enjoying it more fully, or if the listener was having a hard time comprehending the portion, or if the listener was transcribing information contained in the portion. In further particular, the present invention pertains to method and apparatus for replaying the speech, audio and/or audio-visual work in accordance with the Speed Contour or Conceptual Speed Association data structure to produce a xe2x80x9clistener-interest-filteredxe2x80x9d work (xe2x80x9cLIFxe2x80x9d work). The LIF work is useful in a number of applications such as, for example, education, advertising, news delivery, entertainment, public safety announcements and the like.
Presently known methods for Time-Scale Modification (xe2x80x9cTSMxe2x80x9d) enable digitally recorded audio to be modified so that a perceived articulation rate of spoken passages, i.e., a speaking rate, can be modified dynamically during playback. Typical applications of such TSM methods include, but are not limited to, speed reading for the blind, talking books, digitally recording lectures, slide shows, multimedia presentations and foreign language learning. In a typical such application, referred to herein as a Listener-Directed Time-Scale Modification application (xe2x80x9cLD-TSMxe2x80x9d), a listener can control the speaking rate during playback of a previously recorded speaker. This enables the listener to xe2x80x9cspeed-upxe2x80x9d or xe2x80x9cslow-downxe2x80x9d the articulation rate and, thereby, the information delivery rate provided by the previously recorded speaker. As is well known to those of ordinary skill in the art, the use of the TSM method in the above-described LD-TSM application enables the sped-up or slowed-down speech or audio to be presented intelligibly at the increased or decreased playback rates. Thus, for example, a listener can readily comprehend material through which he/she is fast-forwarding.
In a typical LD-TSM system, input from the listener can be specified in a number of different ways. For example, input can be specified through the use of key presses (button pushes), mouse movements, or voice commands, all of which are referred to below as xe2x80x9ckeypresses.xe2x80x9d As a result, one can readily appreciate that an LD-TSM system enables a listener to adjust the information delivery rate of a digital audio medium to suit his/her interests and speed of comprehension.
As one can readily appreciate from the above, in order to optimize the use of such an LD-TSM system, there is a need for determining how listeners interact with audio media that provide TSM. In particular, the actual information delivery rate selected by a listener depends on diverse factors such as intelligibility of a speaker, listener interest in the subject matter, listener familiarity with the subject matter, whether the listener is transcribing the content, and the general amount of time the listener has allotted for receiving the contents of the material.
Prior art methods for determining listener interest in portions of speech and/or audio are inherently inaccurate. Specifically, these methods involve detecting fast-forward and rewind patterns of, for example, a cassette tape produced by button pushes. The use of such fast-forward or rewind patterns suffers from various drawbacks. For example, the listener often alternates between fast-forwarding and rewinding over a particular piece of audio material because the information is either not presented, or is unintelligible while fast-forwarding or rewinding. In addition, whenever a playback location is advanced, this either interrupts playback while advancing through the audio material or presents unintelligible versions of the audio material (xe2x80x9cchipmunk likexe2x80x9d sounds for speed-up, etc.). As such, current methods of determining listener interest are of little use for determining an optimal information delivery rate.
As one can readily appreciate from the above, a need exists in the art for a method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works. In addition, a need exists in the art for a method and apparatus for replaying speech, audio and/or audio-visual works in accordance with the determination of listener interest to provide a listener-interest-filtered work (xe2x80x9cLIFxe2x80x9d work).
Embodiments of the present invention advantageously satisfy the above-identified need in the art and provide method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works and for developing Speed Contours or Conceptual Speed Association data structures that represent measures of listener interest. In addition, further embodiments of present invention provide method and apparatus for utilizing the Speed Contours or Conceptual Speed Association data structures to play speech, audio and/or audio-visual works in accordance with the Speech Contours or the Conceptual Speed Association data structures to provide listener-interest-filtered works (xe2x80x9cLIFxe2x80x9d works).
An embodiment of the present invention is an apparatus for generating a Speed Contour which includes an affinity information used to obtain a time-scale modification (TSM) rate and an identifier information used to obtain an identifier of a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) a user input apparatus that receives user information and directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) a time-scale modification monitor, responsive to the user information, the identifier of the portion, and the portion, that generates the TSM rate and the identifier of a portion associated with the TSM rate; and (d) a speed contour generator, responsive to the TSM rate and the identifier of the associated portion, that generates the Speed Contour.
Another embodiment of the present invention is an apparatus for generating a Conceptual Speed Association data structure which includes an affinity information used to obtain a time-scale modification (TSM) rate and a concept information used to obtain a concept identifier for a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) a user input apparatus that receives user information and directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) a concept decoder, responsive to the identifier of the portion and the portion, that generates a concept for the portion; (d) a time-scale modification concept monitor, responsive to the user information and the concept, that generates the TSM rate and a concept identifier associated with the TSM rate; and (e) a conceptual speed association data structure generator, responsive to the TSM rate and the associated concept identifier, that generates the Conceptual Speed Association data structure.
Another embodiment of the present invention is an apparatus which plays an audio or audio-visual work in conjunction with a Speed Contour which includes an affinity information used to obtain a time-scale modification (TSM) rate and an identifier information used to obtain an identifier of a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) an input apparatus that directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) playback apparatus, responsive to the time-scale modified portion, that plays the time-scale modified portion; and (d) a time-scale modification rate determiner, responsive to the Speed Contour and the identifier of the portion, that generates the TSM rate.
Another embodiment of the present invention is an apparatus which plays an audio or audio-visual work in conjunction with a Conceptual Speed Association data structure which includes an affinity information used to obtain a time-scale modification (TSM) rate and a concept information used to obtain a concept identifier for a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) an input apparatus that directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) playback apparatus, responsive to the time-scale modified portion, that plays the time-scale modified portion; (d) a concept decoder, responsive to the identifier of the portion and the portion, that generates a concept for the portion; and (e) a time-scale modification concept look-up, responsive to the concept and the Conceptual Speed Association data structure, that generates the TSM rate.
Another embodiment of the present invention is a method for generating a listener-interest-filtered work for an audio or audio-visual work, which method comprises the steps of: (a) generating one or more Average Speed Contours for one or more audio or audio-visual works for one or more categories of users; (b) converting the one or more Average Speed Contours to one or more Conceptual Speed Association data structures; and (c) forming a listener-interest-filtered Conceptual Speed Association data structure from the one or more Conceptual Speed Association data structures. This embodiment further includes the step of using the listener-interest-filtered Conceptual Speed Association data structure to create a listener-interest-filtered audio or audio-visual work; or the step of converting the listener-interest-filtered Conceptual Speed Association data structure to a listener-interest-filtered Speed Contour; or the step of using the listener-interest-filtered Speed Contour to create a listener-interest-filtered audio or audio-visual work; or the step of using any of the above to determine listener interest, preferred delivery rate, or listener familiarity with concepts or material in an audio or audio-visual work.