1. Field of the Invention
The present invention relates to retrieval of spoken documents and more specifically to a system and method of performing a lattice-based search for retrieval of a spoken utterance.
2. Introduction
Automatic systems for indexing, archiving, searching and browsing through large amounts of spoken communications have become a reality in the last decade. Most such systems use an automatic speech recognition (ASR) component to convert speech to text which is then used as an input to a standard text based information retrieval (IR) component. This strategy works reasonably well when speech recognition output is mostly correct or the documents are long enough so that some occurrences of the query terms are recognized correctly. Most of the research in this area has concentrated on retrieval of Broadcast News type of spoken documents where speech is relatively clean and the documents are relatively long. In addition, it is possible to find large amounts of text with similar content in order to build better language models and enhance retrieval through use of similar documents.
However, for contexts where spoken document retrieval is desirable but the benefits of clean speech are unavailable, information retrieval becomes more difficult. For example, if one were to record a teleconference and then desire to perform a search or information retrieval of the portions of the conference, the problem becomes more difficult. This is due to the fact that the teleconference likely consists of a plurality of short audio segments that may include many word errors and low redundancy. Further, as opposed to news broadcasts, there may be many speakers in the teleconference each providing small snippets of speech that contributes to the overall spoken document.
Therefore, the same approach used for broadcast news will not provide satisfactory results if one's task is to retrieve a short snippet of speech in a domain where WER's can be as high as 50%. This is the situation with teleconference speech, where one's task is to find if and when a participant uttered a certain phrase.
What is needed in the art are techniques that provide improved spoken document retrieval systems for spoken documents generated from telephone conversations or teleconferences and the like.