In many automated systems it is necessary to provide spoken numbers under automated control. For example, in an interaction voice response (IVR) system it is necessary to an automated system to “speak” numbers from time to time. Such a number could be, for example, “your balance is 5 dollars and 38 cents.” Usually the response is a number sequence having individual strings. An example would be “your account number is 38 4041 256,” having three strings in the sequence. The first string having a length of 2, the second string length being 4, and the fourth string length of 3.
Current IVR systems have ten numbers (0-9) prerecorded. In order to create a group of numbers, the prerecorded numbers are concatenated together in the right order. This was acceptable in situations where the user (listener) was inputting numbers using mechanical touch-tones. In such systems, it was expected that any voice response would sound mechanical. However, as systems began to migrate toward speech recognition, user's have begun to want the “speech” coming from an automated system to be more conversational, such that the message coming to them sounds to them the way a real person would speak.
When a real person says a number string, such as a phone number, the string, such as 972-454-8316 has pauses inserted and each number has an inflection based on where in the string the number falls. Concatenated number strings played to a user do not have the proper inflections and thus such systems are becoming unacceptable.