One common example of a system that provides voice services is an Interactive Voice Response (IVR) system. In prior art systems, a user would typically use a telephone to call in to a central computer system which provides voice services via an IVR system. The IVR system deployed on the central computer system would then launch voice services, for instance by playing an audio clip containing a menu of choices to the user via the telephone line connection. The user could then make a selection by speaking a response. The spoken response would be received at the central computer system via the telephone line connection, and the central computer system would interpret the spoken response using speech recognition techniques. Based on the user's response, the IVR system would then continue to perform application logic to take further action. The further action could involve playing another menu of choices to the user over the telephone line, obtaining and playing information to the user, connecting the user to a third party or a live operator, or any of a wide range of other actions.
In interactive voice response (IVR) systems as discussed above, once the caller is connected to a system, the system plays the user an audio prompt. Typically, when the system is playing an audio prompt to a user that asks the user to make a selection, issue a command, or provide input, the user is able to begin speaking his response at any time. In other words, the user need not wait until the entire audio prompt has played to begin providing his spoken input. In the past, if the user begins speaking (to provide his input) before the audio prompt has finished playing, the system could halt the playing of the audio prompt, or continue playing the prompt. Regardless of which approach is taken, the system would immediately begin to try to interpret the user's spoken input. Either approach can be problematic.
In the case where the audio prompt is halted as soon as the user begins speaking, there is a danger that the user was speaking for some purpose other than to provide input to the system. For instance, the user might be speaking to another person in the room. If the user's utterance was not intended as input in response to the audio prompt, then halting the playing of the audio prompt inevitably delays or impedes the process. In this instance, the system would try to interpret the user's utterance, and even if the system can successfully interpret the response, the words would not be useful as a response to the audio prompt. As a result, the system would then have to notify the user that the spoken response it received was not understood, and the system would have to begin playing the audio prompt over again from the beginning.
Problems can also arise if the audio prompt simply continues to play as the user begins to speak. If the user was actually providing spoken input in response to the audio prompt, the fact that the audio prompt continues to play can be confusing and disruptive to the user. The user might assume that the system cannot hear him, because the audio prompt did not halt. As a result, the user might stop partway through providing his spoken input. And because the system does not receive all of the intended user input, the system will have to indicate that the response was not understood, and the system would then play the same audio prompt over again from the beginning. All of which can be very frustrating to the user. In fact, it is likely that the second time the audio prompt begins to play, the user will wait for the prompt to completely play before speaking his response, to avoid the problems experienced the first time around. And this will unnecessarily delay the process of receiving the user's input.
In another instance, the user might speak a response to the audio prompt while the audio prompt continues to play, but the response will be unrecognizable to the system. If the system is unable to interpret the response, the system will assume that the utterance was not intended as a response to the audio prompt, but was likely a statement made to a third party while the audio prompt was playing. Based on this assumption, the system will continue to play the audio prompt to completion, and it will then wait to receive the user's spoken input. The user, thinking that he has already given his response, will also wait for the system to take the next action. After a certain period of time has elapsed, the system will provide an indication that no input has been received, and it will then play the same audio prompt over again from the beginning. All of which can be extremely frustrating to the user.
In addition to the above-discussed drawbacks of existing IVR systems, the ability to provide any type of voice services has been quite limited by the nature of the systems that provide such services. In the known systems that provide voice services using relatively complex speech recognition processing, the voice applications are performed on high end computing devices located at a central location. Voice Application processing requires a high end centralized computer system because these systems are provisioned to support many simultaneous users.
Because complex voice application processing must be provided using a high end computer system at a central location, and because users are almost never co-located with the high end computer system, a user is almost always connected to the central computer system via a telephone call. The call could be made using a typical telephone or cell phone over the PSTN, or the call might be placed via a VoIP-type (Skype, SIP) connection. Regardless, the user must establish a dedicated, persistent voice connection to the central computer system to access the voice services.
In a typical prior art architecture for a centralized voice services platform, the speech recognition functions are performed at a central computer system. A user telephone is used to place a telephone call to the central voice services platform via a telephone network. The telephone network could be a traditional PSTN, or a VoIP based system. Either way, the user would have to establish the telephone call to the central voice service platform via a telephone carrier.
The prior art centralized voice services platforms, which depend on a telephony infrastructure for connection to users, are highly inflexible from a deployment standpoint. The configurations of hardware and software are all concentrated on a small number of high end servers. These configurations are technically complex and hard to monitor, manage, and change as business conditions dictate. Furthermore, the deployment of existing IVR system architectures, and the subsequent provisioning of users and voice applications to them, requires extensive configuration management that is often performed manually. Also, changes in the configuration or deployment of IVR services within extant IVR architectures often require a full or partial suspension of service during any reconfiguration or deployment effort.
Further, cost structures and provisioning algorithms that provision the capabilities of such a centralized voice services platform make it virtually impossible to ensure that a caller can always access the system when the system is under heavy usage. If the system were configured with such a large number of telephone line ports that all potential callers would always be connected to access contrasting types of voice services, with different and overlapping peak utilization hours, the cost of maintaining all the hardware and software elements would be prohibitive. Instead, such centralized voice services platforms are configured with a reasonable number of telephone ports that result in a cost-effective operating structure. The operator of the system must accept that callers may sometimes be refused access. Also, system users must accept that they will not receive an “always on” service.
Prior art centralized voice services platforms also tend to be “operator-centric.” In other words, multiple different service providers provide call-in voice services platforms, but each service provider usually maintains their own separate platform. If the user has called in to a first company's voice services platform, he would be unable to access the voice services of a second company's platform. In order to access the second company's voice services platform, the user must terminate his call to the first company, and then place a new call to the second company's platform. Thus, obtaining access to multiple different IVR systems offered by different companies is not convenient.
In addition to the above-described drawbacks of the current architecture, the shared nature of the servers in a centralized voice services platform limits the ability of the system to provide personalized voice applications to individual users. Similarly, the architecture of prior art IVR systems limit personalization even for groups of users. Because of these factors, the prior art systems have limitations on their ability to dynamically account for individual user preferences or dynamically personalize actual voice applications on the fly. This is so because it becomes very hard for a centralized system to correlate the user with their access devices and environment, to thereby optimize a voice application that is tuned specifically for an individual user. Further, most centralized systems simply lack user-specific data.
With the prior art voice services platforms, it was difficult to develop efficient mechanisms for billing the users. Typically, the telephone carrier employed by the user would bill the user for calls made to the voice services platform. The amount of the charges could be determined in many different ways. For instance, the telephone carrier could simply bill the user a flat rate for each call to the voice services platform. Alternatively, the telephone carrier could bill the user a per-minute charge for being connected to the voice services platform. In still other methods, the voice services platform could calculate user charges and then inform the carrier about how much to bill the user. Regardless of how the charges are calculated, it would still be necessary for the telephony carrier to perform the billing, collect the money, and then pay some amount to the voice service platform.
Prior art voice services platforms also had security issues. In many instances, it was difficult to verify the identity of a caller. If the voice services platform was configured to give the user confidential information, or the ability to transfer or spend money, security becomes an important consideration.
Typically, when a call is received at the voice services platform, the only information the voice services platform has about the call is a caller ID number. Unfortunately, the caller ID number can be falsified. Thus, even that small amount of information could not be used as a reliable means of identifying the caller. For these reasons, callers attempting to access sensitive information or services were usually asked to provide identifying data that could be compared to a database of security information. While this helps, it still does not guarantee that the caller is the intended user, since the identifying data could be provided by anybody.
Some prior art voice services platforms were used to send audio messages to users via their telephones. The central voice services platform would have a pre-recorded audio message that needed to be played to multiple users. The platform would call each of the users, and once connected to a user, would play the audio message. However, when it was necessary to contact large numbers of users, it could take a considerable amount of time to place all the calls. The number of simultaneous calls that can be placed by the centralized voice services platform is obviously limited by the number telephone ports it has. Further, in some instances, the PSTN was incapable of simultaneously connecting calls on all the available line ports connected to the voice services platform. In other words, the operators found that when they were trying to make a large number of outgoing calls on substantially all of their outgoing lines, the PSTN sometimes could not simultaneously connect all of the calls to the called parties. Further, when a voice services platform is delivering audio messages in this fashion, they tie up all the telephone port capacity, which prevents users from calling in to use the service.