1. Field of the Invention
The present invention relates generally to archiving audio records or audio/video records. This employs automatic speech recognition to transcribe from audio to at least one layer of recognized text. Subsequently, this system uses automatic and semi-automatic techniques to search through these plural archive layers.
2. Description of the Related Art
Many applications exist where audio and video content is maintained in an archive for later recall. For example, television networks generally record video tape copies of all broadcasts along with a volume of related material. This quickly accumulates into a considerable archive. At a later date, when a user needs to identify those tapes of interest, the task is formidable.
To ease the research task, such archives are generally augmented with a manually created index of topics and key words. However, such indexes can be incomplete and lead to unreliable search results. In particular, the topics and key words often change, which complicates searching through earlier records.
In order to provide a complete, searchable index of the archive, a full text transcription of each record can be generated. These transcripts can then be searched for relevant terms. However, manual transcription is a labor intensive operation, which may accumulate to a considerable expense.
In another archiving application, many service bureaus store all telephone conversations and phone mail from customers in an audio archive. An explicit summary of each record is manually generated and stored in the archive. To identify records of interest, a user can search the explicit summaries using a keyword search. However, since the summaries are manually created, many relevant records are likely to be missed due to incorrect or incomplete summaries.
In an effort to automate the transcription process for these applications, automated speech recognition (ASR) techniques have been employed. Unfortunately, with a wide range of speech pronunciation variability and audio degradation from background noise, phone lines and the like, present day ASR systems typically provide a transcription which includes many errors. Thus, the ASR transcription often requires manual correction, which is a labor intensive operation.
In currently available ASR systems, each spoken word maps to several possible candidate words with varying probabilities of a match. Current ASR uses linguistic context, and a statistical language model (based on language statistics for words, word pairs, and word triplets) to select among these candidate words. Nevertheless, these ambiguities often are resolved imperfectly.
An xe2x80x9cutterancexe2x80x9d is a segment of spoken speech which is acoustically and linguistically largely self-contained. For example, a clause or short sentence may form one utterance. Thus spoken speech is divided into xe2x80x9cspoken utterancesxe2x80x9d, and the corresponding recognized text is divided into xe2x80x9crecognized utterancesxe2x80x9d. Using current ASR, each spoken utterance typically translates to many tentative xe2x80x9crecognized utterancesxe2x80x9d, each with an estimated probability of matching the spoken utterance. Unfortunately, the recognized utterance with the largest estimated probability often is not an exact transcription of the spoken utterance.
Therefore, there remains a need for an improved audio/video archive system and method which provides automated indexing and searching of records having an audio component in a manner that overcomes the problems associated with a prior art.
In accordance with one form of the present invention, an archiving structure for records having an audio component includes an original audio archive layer, a compressed audio archive layer corresponding to the original audio archive layer, and at least one layer of recognized text corresponding to the original audio archive layer. The recognized text layers serve as indexes and guides for searching the original audio archive layer.
In accordance with another form of the present invention, an archiving system for records having an audio component includes: means for generating and accessing an original audio archive layer; means for generating and accessing a compressed audio archive layer corresponding to the original audio archive layer; and means for generating and accessing at least one index archive layer corresponding to said original audio archive layer.
A further embodiment of the present invention also includes means for automatically searching the index archive layer and means for refining this search by survey and selection of audio regenerated from the compressed audio layer.
In accordance with a method of the present invention, archiving of records including an audio component starts by storing the original audio in a first archive layer. From the original audio, a compressed audio archive layer is then created. Automatic speech recognition is used to create at least one layer of recognized text. In a preferred embodiment, these layers include: a global word index layer; a recognized word-bag layer containing all candidate recognized words for each spoken utterance; and a recognized word-lattices layer summarizing all candidate recognized words, how they are ordered and correlated for each spoken utterance. The archive may further include a layer of recognized words and a layer of recognized utterances.
In accordance with an archive search method of the present invention, a search is conducted in several steps. Preferably these steps include: a search of the global word index layer; then a search through the layer of recognized word-bags; and then a search through the layer of recognized word-lattices. A subset of records located by one search stage are refined by the next search stage. After these automatic search stages, the resulting subset of records is then used for a manual survey of the compressed audio archive layer. These successive refinements identify a small set of relevant records which are selectively retrieved from the original audio archive layer.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof which are to be read in connection with the accompanying drawings.