A speech recognition module/system often needs to preset a word set and acoustic models for the search network constructions before running, so that the search network and the acoustic model may be referenced by the speech recognition module/system during recognition. Hence, the preparation can be done before running the speech recognition module and/or system.
In actual applications, when the preset word is not the user's preferred word, or the preset language or accent of the speech recognition module and/or system is different from the user, the recognition result may be very poor. For example, a speech recognition enabled device for operating household appliances includes a Mandarin acoustic model and a word set for Taiwanese users, such as, “turn on the light”, “turn on the air conditioner”, and so on. However, for the user that is preferable to use Ho-Lo language or accustomed to use the phrase “light on” instead of the preset one “turn on the light”, the recognition cannot work well. As a result, the user will be unwilling to use the speech recognition function. As such, some customizations or adjustments on the speech recognition enabled device are required for the user's preferences.
A technique uses a graphic interface to add new word. A spelling corresponding to the new word may be obtained by comparing against a spelling database. Then, the new word is added to the speech recognition dictionary. A word addition technique uses an ambiguity detection engine to detect whether ambiguity exists between the input word and the existent word. When no ambiguity exists, a feedback returns to the user and the engine asks the user whether the new word should be added. Another technique uses a phonetic structure to perform word element-specific mode acoustic phonetic recording, classification of word element modes and phoneme-to-grapheme conversion of word element-specific modes for the input acoustic data to generate word elements. Another technique first detects whether a substantially match exists for the input word, and if so, at least one synonym replaces the word and requests the speech input of the user intending to use the word to add the synonym. Another technique uses a microphone to character-by-character add the word and provides an operation interface for adding word.
The existing products with speech recognition capability are restricted by locality, as customization design may be made for different regions due to language or accent difference and the design may take a long time. For example, a large amount of speech data of a region can be collected to cover all kinds of possible accents and demography, the data quality can be examined and training suitable for the acoustic model of the region.
In the existing speech recognition techniques with word generation capability, some customizations are required to adapt to different regional accents, while some requirements need to preset the word set and acoustic models so that the search network and the acoustic models may be referenced by the speech recognition module/system during recognition. In actual applications, customizations or adjustments may also be required to adapt to the user's preference. Therefore, it is imperative to provide a speech recognition module and/or system able to adjust according to user demands so that the user may conveniently operate as well as reduce the cost of the solution provider.