1. Technical Field
The present invention generally relates to computer systems and in particular to voice recognition usage within computer systems.
2. Description of the Related Art
Recently, databases operating with spoken file retrieval, or queries by voice, have become more utilized. From databases managed on a corporate level to databases operated in children's toys, computer based queries by voice are rapidly becoming a daily practice. Existing voice recognition systems allow a user to search various kinds of databases that contain documents, video, audio, and other files. Existing systems are completely text based in that when a user speaks an item's name, text results are returned for use in selecting items from within the database. However, often there are incredibly large numbers of text strings to be compared against and text searches are not efficient.
Systems have been proposed which compare the text or phonetic transcription of the user's voice input query with the phoneme (or text) annotation data in a database. The technique for matching the sequences of characters or phonemes firstly defines a set of features in the query, each feature being taken as an overlapping, fixed size fragment from either the text or phoneme string. Then the frequencies of the text (character) or phoneme occurrences are identified in both the query and the annotation. Finally, a measure of the similarity between the query and the annotation is determined utilizing a cosine measure of the frequencies of occurrences. Although this system is manageable, it is only efficient for a small database of files.
There are approximately 43 phonemes and roughly as many characters (letters and symbols) in the English language, any given phoneme or character may occur tens of thousands of times within a database. Typically, the recognition of phonemes may be 60% to 70% efficient, thereby increasing the difficulty of retrieving data where the phonetic query was misrecognised. If a database is large, then the previously mentioned retrieval method is long and inefficient. Searching through multiple files for a single document, song information text, or video (identification name for example) may be tedious and extremely time consuming. Current systems typically perform a linear search though potential matches within a database.