1. Field
The invention relates to systems and methods that utilize speech recognition techniques to interact with a user to allow the user to obtain information and to perform various functions.
2. Background
There are various existing computer and telephony systems that provide voice services to users. These voice services can be speech recognition and touchtone enabled. Examples of such services include voice mail, voice activated dialing, customer care services, and the provision of access to Internet content via telephone.
One common example of a system that provides voice services is an Interactive Voice Response (IVR) system. In prior art systems, a user would typically use a telephone to call in to a central computer system which provides voice services via an IVR system. The IVR system deployed on the central computer system would then launch voice services, for instance by playing an audio clip containing a menu of choices to the user via the telephone line connection. The user could then make a selection by speaking a response. The spoken response would be received at the central computer system via the telephone line connection, and the central computer system would interpret the spoken response using speech recognition techniques. Based on the user's response, the IVR system would then continue to perform application logic to take further action. The further action could involve playing another menu of choices to the user over the telephone line, obtaining and playing information to the user, connecting the user to a third party or a live operator, or any of a wide range of other actions.
The ability to provide voice services has been quite limited by the nature of the systems that provide such services. In the known systems that provide voice services using relatively complex speech recognition processing, the voice applications are performed on high end computing devices located at a central location. Voice Application processing requires a high end centralized computer system because these systems are provisioned to support many simultaneous users. To get economies of scale, it is imperative for these systems to share telecom and computing resources across users. Such high end computing systems share, across multiple users, many phone lines, many IVR servers that connect to the phone lines, multiple speech recognition servers, one or more text-to-speech servers, and a farm of application servers to process application logic during the course of a user interaction. Often, other equipment like switches and media gateways are also present in the centralized computer system. Management, integration and provisioning of these systems to support usage has been very complicated and expensive. Examples of such high end speech recognition systems are described in U.S. Pat. Nos. 6,229,880 and 6,741,677 to Reformato et al.; U.S. Pat. No. 6,891,932 and Patent Publication No. 2005/0053201 to Bhargava et al.; U.S. Pat. No. 6,477,240 to Lim et al.; and U.S. Patent Publication No. 2006/0015556 to Pounds et al., the respective disclosures of which are all hereby incorporated by reference.
Because complex voice application processing must be provided using a high end computer system at a central location, and because users are almost never co-located with the high end computer system, a user is almost always connected to the central computer system via a telephone call. The call could be made using a typical telephone or cell phone over the PSTN, or the call might be placed via a VoIP-type (Skype, SIP) connection. Regardless, the user must establish a dedicated, persistent voice connection to the central computer system to access the voice services.
FIG. 1 depicts a typical prior art architecture for a centralized voice services platform. In this type of system, the speech recognition functions are performed at a central computer system. As shown in FIG. 1, a user telephone 1010 is used to place a telephone call to a central voice services platform 1060 via a telephone network 1040. The telephone network 1040 could be a traditional PSTN, or a VoIP based system. Either way, the user would have to establish the telephone call to the central voice service platform 1060 via a telephone carrier.
As mentioned earlier, the central voice services platform must be capable of handling a large number of simultaneous telephone calls, especially during peak hours. Providing and maintaining the hardware capability to maintain multiple simultaneous separate voice telephone calls is quite expensive. For instance, the average cost of providing a single IVR telecom port presently ranges from $1,500 to $3,000 per telephone line of service.
Merely paying for the connect time on a large number to telephone lines can be rather expensive. A public telephony based IVR system service provider often must commit to a minimum volume of minutes with a telephony carrier vendor, leading to a fixed minimum telecom related expense. This creates a situation where the service provider needs to quickly ramp up the volume of business in order to recover the telecom expense per user, and thus increase the profit margin per user.
Also, as discussed, the central voice services platform is complicated and expensive to begin with. These traditional IVR system deployments are also highly vulnerable to the failure of one or more components. It requires extensive redundant hardware and software systems in order to overcome this vulnerability in order to provide reliable service. And because the hardware and software being used is expensive to begin with, providing redundant capabilities is very expensive.
Also, the prior art centralized voice services platforms, which depend on a telephony infrastructure for connection to users, are highly inflexible from a deployment standpoint. The configurations of hardware and software are all concentrated on a small number of high end servers. These configurations are technically complex and hard to monitor, manage, and change as business conditions dictate. Furthermore, the deployment of existing IVR system architectures, and the subsequent provisioning of users and voice applications to them, requires extensive configuration management that is often performed manually. Also, changes in the configuration or deployment of IVR services within extant IVR architectures often require a full or partial suspension of service during any reconfiguration or deployment effort.
The provisioning of a typical high end high end centralized computer system has also been complicated by the type of voice services provided by such systems and the usage pattern associated with such voice services. For instance, a Voice Mail service system may have different provisioning requirements than an outbound notification system. In this regard, the service provider using a high end high end centralized computer system would have to manage a very high level of complexity if it had to simultaneously provide contrasting voice services. The types of voice services drive the traffic pattern of calls, driving the number of phone lines needed, and the need for speech recognition servers and associated application processing servers. These issues lead to many specialized voice services providers.
Further, cost structures and provisioning algorithms that provision the capabilities of such a centralized voice services platform make it virtually impossible to ensure that a caller can always access the system. If the system were configured with such a large number of telephone line ports that all potential callers would always be connected to access contrasting types of voice services, with different and overlapping peak utilization hours, the cost of maintaining all the hardware and software elements would be prohibitive. Instead, such centralized voice services platforms are configured with a reasonable number of telephone ports that result in a cost-effective operating structure. The operator of the system must accept that callers may sometimes be refused access. Also, system users must accept that they will not receive an “always on” service.
Prior art centralized voice services platforms also tend to be “operator-centric.” In other words, multiple different service providers provide call-in voice services platforms, but each service provider usually maintains their own separate platform. Even when several service providers are all using a common set of hardware and software, each company usually maintains its own separate call in telephone number. If the user has called in to a first company's voice services platform, he would be unable to access the voice services of a second company's platform. In order to access the second company's voice services platform, the user must terminate his call to the first company, and then place a new call to the second company's platform. Thus, obtaining access to multiple different IVR systems offered by different companies is not convenient.
To address the problem of switching to a different voice services platform, some IVR systems attempted to develop the ability to switch a caller off to a different voice services platform, or to a live operator, without forcing the user to hang up and place a new call. However, because a user is connected to the first voice services platform via a dedicated telephone line connection, passing the caller off to a live operator or to a third party's voice services platform can be difficult and expensive. In some instances, it may be possible for the central computer of the first voice services platform to communicate with the PSTN to instruct the PSTN to re-connect the existing call to a third party number. But often the local PSTN carrying the call lacks the ability to make such a switch. Even where it is possible, it is difficult to develop communications switching code that will work with all PSTN equipment. More often, the central computer system is forced to make a call to the live operator or third party voice services platform using another dedicated phone line, and then bridge the original caller to the newly placed call to the operator/third party. The end result is that the caller is now using two dedicated phone ports of the first voice services platform, and the user is no longer even making use first voice services platform. The operator of the first voice services platform must pay for the connect time on two dedicated lines, and the two dedicated lines cannot be used by the system to service other users.
In addition to the above-described drawbacks of the current architecture, the shared nature of the servers in a centralized voice services platform limits the ability of the system to provide personalized voice applications to individual users. Similarly, the architecture of prior art IVR systems limit personalization even for groups of users. Because of these factors, the prior art systems have limitations on their ability to dynamically account for individual user preferences or dynamically personalize actual voice applications on the fly. This is so because it becomes very hard for a centralized system to correlate the user with their access devices and environment, to thereby optimize a voice application that is tuned specifically for an individual user. Further, most centralized systems simply lack user-specific data.
The prior art systems, because they are so tied to the telephone network to provide user access, have trouble rapidly deploying new applications. It becomes necessary to manage and re-route call traffic during any maintenance activities. This can be particularly difficult with multiple contrasting voice services being offered on the same system.
Some prior art voice services platforms were used to send audio messages to users via their telephones. The central voice services platform would have a pre-recorded audio message that needed to be played to multiple users. The platform would call each of the users, and once connected to a user, would play the audio message. However, when it was necessary to contact large numbers of users, it could take a considerable amount of time to place all the calls. The number of simultaneous calls that can be placed by the centralized voice services platform is obviously limited by the number telephone ports it has. Further, in some instances, the PSTN was incapable of simultaneously connecting calls on all the available line ports connected to the voice services platform. In other words, the operators found that when they were trying to make a large number of outgoing calls on substantially all of their outgoing lines, the PSTN sometimes could not simultaneously connect all of the calls to the called parties. Further, when a voice services platform is delivering audio messages in this fashion, they tie up all the telephone port capacity, which prevents users from calling in to use the service.
With the prior art voice services platforms, it was difficult to develop efficient mechanisms for billing the users. Typically, the telephone carrier employed by the user would bill the user for calls made to the voice services platform. The amount of the charges could be determined in many different ways. For instance, the telephone carrier could simply bill the user a flat rate for each call to the voice services platform. Alternatively, the telephone carrier could bill the user a per-minute charge for being connected to the voice services platform. In still other methods, the voice services platform could calculate user charges and then inform the carrier about how much to bill the user. Regardless of how the charges are calculated, it would still be necessary for the telephony carrier to perform the billing, collect the money, and then pay some amount to the voice service platform.
To begin with, these prior art billing mechanisms were cumbersome at best. Both the telephony carrier and the voice services platform had to create relatively complex accounting systems to track the user's charges, and to ensure that everybody received adequate payment for the services delivered to the users.
Also, a voice services platform might offer a variety of different services, all of which are accessible once a caller has been connected to the voice services platform. Some premium services might cost more to deliver to the user than simple standard services. Ideally, the user should pay for the services that he uses. But in order to operate in this fashion, it was necessary for the voice services platform to track charges on an individual, per-user basis, and to then inform the carrier of what to charge the user. This involves the cumbersome transfer of billing data, all of which had to be verified.
For all the above reasons, billing for services delivered to users of central voice services platforms is cumbersome, expensive, and difficult to tailor to actual services usage.
Prior art voice services platforms also had security issues. In many instances, it was difficult to verify the identity of a caller. If the voice services platform was configured to give the user confidential information, or the ability to transfer or spend money, security becomes an important consideration.
Typically, when a call is received at the voice services platform, the only information the voice services platform has about the call is a caller ID number. Unfortunately, the caller ID number can be falsified. Thus, even that small amount of information could not be used as a reliable means of identifying the caller. For these reasons, callers attempting to access sensitive information or services were usually asked to provide identifying data that could be compared to a database of security information. While this helps, it still does not guarantee that the caller is the intended user, since the identifying data could be provided by anybody.
The above references are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features and/or technical background.