1. Field of Invention
This invention relates to audio-video data storage and retrieval and in particular to techniques for the selective retrieval of data stored on audio-video media, such as video tape and video disks.
2. Description of Prior Art
Audio-video recording devices such as Video Cassette Recorders (VCR's) have been automated with regard to scheduling recording sessions. The user may specify date, time, duration and TV channel either explicitly or implicitly through published codes such as VCR-Plus. Advances in the automation of recording do not address the great inconvenience of identifying and positioning specific media for playback. Bronson U.S. Pat. No. 5,136,655 (1992) and Ely U.S. Pat. No. 5,600,756 (1997) address this problem by processing the video and/or audio data through a speech recognition engine to create text-based indices for subsequent matching. Bareis U.S. Pat. No. 5,617,407 (1997) includes templates for speech recognition on the storage medium. Neither of these techniques can be practically applied to allow a consumer to catalog the television programs recorded at home. The consumer's state of the art is to manually write the contents of a recording session on a paper label.
Another difficulty lies in identifying which media is available to safely reuse for future recording sessions. Without a strictly observed manual protocol, the user cannot easily determine which tapes contain programs that have been viewed. If no unviewed tapes are available, the user cannot easily determine which tapes have been recorded less recently than others and are therefore candidates to be reused.
A database of media contents could be maintained in the recording device or a personal computer. This would facilitate the organization of the user's recording collection, and could automate the positioning of a VCR tape for playback. The drawback to such a scheme is the alphabetic nature of the program identification. The user does not have a convenient device for entering alphanumeric data, although provision of a physical keyboard or a video emulation of a keyboard is possible. More detrimental to such a scheme is the time required entering such data. Typing in program names, with correct spelling, would be onerous to most consumers.
Speech recognition by computers has been applied to command and dictation applications. In both applications, the speech recognition engine compares an utterance by the user to a number of possible reference utterances and returns one or more potential matches, together with a confidence score for each. Command applications process discrete utterances that may be a word or phrase that is clearly delimited by silence. Dictation demands the more difficult task of parsing input utterances from a continuous stream. In a speaker-dependent model, the current user has previously provided the reference utterances. This is not the case in a speaker-independent model, where accurate matching is more difficult.
Advances in speech recognition algorithms and the increased ratio of performance to cost of microprocessor technology make feasible the introduction of speech recognition to consumer devices such as VCR's. While speech recognition could be applied to VCR's in a command context, its appeal to consumers would be limited because it would cost more but provide no real benefit over a touch-base remote control.