In automated systems, user interfaces, employing speech recognition, often have a limited vocabulary of recognizable words. These speech recognition interfaces, in many cases, are designed under the assumption that a user will pronounce a specific word or set of words (keywords) which are expected to be used to trigger an action, e.g., move to the next prompt, provide a specific service, or transfer to an operator. To increase the likelihood that the user will respond to a prompt using one of the expected words, keywords are sometimes presented to the user in the hope that the user will respond by saying one of the keywords. Known current interface practices include: (1) providing the user with an initial list of recognizable response keywords at the end of an initial prompt, (2) waiting for the user to fail to respond to an initial prompt for a specific interval of time and then provide a list of recognizable response keywords, or (3) waiting for the user to make an error, e.g., pronounce a word not included in the set of anticipated recognizable response keywords and then provide a list of recognizable response keywords. When method (1) is employed, the automated speech interface normally presents the keywords slowly, clearly, and distinctly. In such a case, a user may become annoyed while impatiently waiting for the keyword message to end. With the approach (2), the user interface, by waiting for the user to fail to respond before proceeding, is waiting for the user to become confused, again resulting in an unsatisfied user. Approach (3) waits for the user to make an error and then responds, at which point the user may be frustrated.
All of these known methods have a tendency to cause user agitation and often result in dissatisfaction with automated speech recognition user interfaces. Dissatisfaction may result in a user hanging up on the automated system or being in a general state of agitation when executing a transaction and/or when ultimately coupled to a human operator. A resulting intentional disconnection by an annoyed user may result in the loss of business or the loss of a customer to a competitor. Placing the user in a general state of agitation may make the user less likely to be persuaded to make a purchase or sign up for a contract or service. For systems using voice recognition for customer service, starting off by agitating the customer will generally result in a more argumentative customer and make it more difficult for the customer service representative to reach a reasonable settlement with the customer.
While dissatisfaction may occur when speech interfaces are used, the cost savings made possible by such systems remain a motivating factor. Speech recognition user interfaces may result in significant cost savings over direct human operators. Therefore, companies have an incentive to maximize usage of automated voice user interfaces wherever possible. One of the limiting factors on the use of automated voice interfaces is the negative effect on business resulting from the minor annoyances (as previously described) inherent in existing interactive automated voice interfaces. Based upon the above discussion, there is significant room for, and a need for, improvements in the usability of existing automated speech recognition user interfaces.
Several empirical findings in human perceptual research relevant to the invention shall now be discussed.                (1) In a widely observed type of forgetting, known as “tip-of-the-tongue” phenomena (the nagging feeling of knowing a name or word but not being able to retrieve it), phonological cues are known to aid retrieval.        (2) A kind of selective attention, known as the “cocktail party” effect, exists and has been observed in natural and laboratory settings. This “cocktail party” effect describes the human ability to focus one's listening attention on a single talker among a cacophony of conversations and background noise.        (3) Research has demonstrated that the probability of a listener correctly hearing a word varies with the word occurring in a particular context. For example, after hearing the word “bread”, the subsequent occurrence of “butter” or “knife” is more likely than “eraser” or carburetor”.        (4) Perceptual research has demonstrated that even very rapidly presented (subthreshold) stimuli, e.g., a 28-msec visual presentation of a word such as “canary”, can accelerate and facilitate subsequent perception of related stimuli, e.g., more rapid perception of the word “parrot”.        
In view of the above problems with existing speech user interfaces it can be appreciated that there is a need for improved speech interfaces. For increased levels of user satisfaction any improved interface should address at least some of the problems with existing interface techniques and, optionally, take advantage of one or more characteristics of human perception discussed above.