Demand for voice-activated user-customizable applications such as voice-based speed-dialing is increasing rapidly. Templates used for recognition may be either speaker-dependent or speaker-independent. Speaker-dependent templates are acoustic models derived from the speaker's utterance. Typically, speaker-dependent templates employ "garbage" models against which user-defined phrases are scored to provide out-of-vocabulary rejection. Speaker-dependent templates are problematic in that they generally require large amounts of memory that grows correspondingly with each phrase and user added.
Speaker-independent templates utilize fixed acoustic models and may require only a few hundred bytes of storage for user-defined phrases. As a result, speaker-independent templates may accommodate a large number of users and user-defined phrases with very little increase in memory storage. A problem with speaker-independent templates, however, is that garbage models do not function well in the speaker-independent environment. This results in poor out-of-vocabulary rejection, which can lead to costly errors such as dialing a wrong phone number.