Speech to text (STT) mechanisms may automatically transcribe audio or speech data to text data. Text data is often preferred over audio data since it is searchable and relatively easy to review. Furthermore, text data is exact, whereas human speech samples are often uncertain based on inconsistencies in the human voice. Accordingly, text data may be preferred to control automated mechanisms predictably and exactly.
Speech to text (STT) mechanisms typically use a dictionary or database of words, phonemes and phrases to convert speech data to text data. Some STT mechanisms are rigid and require transcribed speech data to exactly match an entry in the dictionary. Such an STT mechanism may only transcribe words already contained within the dictionary and may incorrectly transcribe or search for any words outside the dictionary, for example, slang words, esoteric words specific to a professional field, or words mumbled or spoken with an accent in an audio recording.
To solve this problem, some STT mechanisms search for “out of vocabulary” (OOV) words and phrases which are not contained in the dictionary. STT mechanisms may search for out of vocabulary words by expanding the STT dictionary. However, expanding the STT dictionary may increase memory resources and become cumbersome to search, for example, increasing search time.
Other STT mechanisms may only require search terms to approximately (not exactly) match dictionary entries. For example, only a sub-set of phonemes in a search word may match a dictionary entry to generate a positive search result. By lowering the standards for defining a match, such mechanisms often generate incorrect search results.
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.