1. Field of the Invention
This invention relates generally to the construction and use of distributed interactive voice and speech processing systems, including interactive voice response (IVR) systems and voice messaging (VM) systems. More particularly, the invention relates to form based publishing of voice information and the use of universally accessible personal profiles for authentication of the user by voice signatures and generating context sensitive active vocabularies to improve speaker dependent speech recognition. The invention also relates to the use of the user attributes and preferences stored in universally accessible personal profiles to improve the efficiency of navigation and search as well as efficacy of search results pertaining to user queries.
2. Description of the Related Art
Conventional interactive voice response (IVR) systems allow a user to place a telephone call into a system, navigate (generally using touch tone input) through a hierarchy of options in response to voice prompts and retrieve information stored in a computer database. Airlines, banks, credit companies and many other service organizations are just a few examples of the types of businesses using IVR systems to allow a customer (or prospective customer) to retrieve desired information. These conventional systems are generally organization-specific in that they offer access to a single database or set of databases related to the goods, services or other aspects of the organization maintaining the IVR system. Thus, conventional IVR technology is used to offer access to information specific to a single organization (i.e. a specific airline, bank or credit company). For example airlines typically use IVR to allow callers to access flight arrival and departure information or to select reservation options, for the particular airline only.
It is desirable to provide an IVR system that enables access to an aggregation of databases and services rather than a single database and service. One barrier to the provision of aggregated services in an IVR system is that conventional IVR systems do not have a distributed information publishing means. Conventional IVR systems do not have a mechanism for service/information providers to readily access the IVR system and add updated or entirely new information for publication on the IVR system.
Further, conventional IVR systems are generally configured for uniform access by any caller admitted to the IVR system. Each caller is handled by the system in the same manner and offered an identical set of options. One reason that IVR systems use uniform user interfaces for each caller rather than caller-specific configurations is that conventional IVR systems operate in xe2x80x9cclosedxe2x80x9d computer environments hosting the particular IVR system. Thus, when a caller accesses a conventional IVR system, the only caller-specific information which the system has at its disposal, is any information previously provided by the caller which the system has maintained or any information that is provided by the caller during the IVR session (i.e. when a user enters an account number using touch tone telephone input). Because, however, collecting and storing caller-specific information with conventional technology is cumbersome and time consuming, most IVR systems do not offer caller-specific (caller customized) features.
There are numerous applications in which it is desirable for an IVR system to use caller-specific information in handling a call. Caller-specific information in the form of user preferences can aid in minimizing the size of a command tree which the user must navigate to access desired information. Additionally, caller specific information could also be used to authenticate the identity of a user in cases where security is an issue (i.e. in bank and credit contexts). Further, caller-specific speech training profiles could be used to implement speaker dependent speech recognition to allow for a caller to use voice commands in place of touch-tone commands. Still further, an IVR system having access to caller-specific data could be used to apply IVR technology in new application areas such as personal productivity.
Thus, there is a need for an improved voice and speech processing system that provides universal access to caller-specific information to provide user-customized IVR systems. Further, there is a need to provide universal access to voice and speech files in order to allow widespread use of such files for caller authentication and for performing speaker dependent speech recognition in IVR systems.
The system and method of the present invention extends World Wide Web (referred to herein as xe2x80x9cwwwxe2x80x9d or the xe2x80x9cwebxe2x80x9d) and Internet technology to provide universally accessible caller-specific profiles that are accessed by one or more IVR systems. The invention features a set of web pages containing information (components) formatted using MIME and hypertext markup language (HTML) standards with extensions for voice information access and navigation. These web pages are linked using HTML hyper-links that are accessible to users via voice commands and touch-tone inputs. These web pages and components in them are addressable using HTML anchors and links embedding HTML universal (uniform) resource locators (URLs) rendering them universally accessible over the Internet. This collection of connected web pages are referred to herein as the xe2x80x9cvoice webxe2x80x9d and the individual pages are referred to herein as xe2x80x9cvoice web pagesxe2x80x9d. Each web page in the voice web contains a specially tagged set of key words and touch tone sequences that are associated with embedded anchors and links used for navigation within the web.
In addition, the invention features a set of linked HTML pages representing the user""s xe2x80x9cpersonal profilexe2x80x9d. The personal profile contains user""s attributes and preferences. Attributes include user""s name, address, phone number, personal identification code, voice imprints for authentication, speech training profile and other information. Preferences include, configuration preferences such as personal greetings and gender and language selection, selection preferences such as bookmarks and favorite places and presentation preferences such as priority ordering, default overrides and preferred vocabulary.
The personal profile is designed for component access within web pages allowing easy extraction of context sensitive profile information. In particular, speech training profiles (included as a user attribute and which contain word patterns representing speaker dependent training information) partitioned into sets of related words likely to occur in combination within corresponding voice web pages. A set of command and control words such as xe2x80x9cplay, pause, continue, previous, next, home, reload, help, etc.xe2x80x9d are stored in a top level component set enabling user dependent but context independent navigation and control. Other component sets are designed to match the key word sets in corresponding voice web pages such as a calendar page or an address book page enabling user and context dependent navigation and control.
When a user calls into the distributed voice and speech processing system associated with the voice web, the system first identifies the user utilizing a unique account number (such as phone number or social security number). Next, it accesses the user""s personal profile using the corresponding URL and retrieves the user attributes and preferences related to authentication and security. Using this personal profile information, the voice web system authenticates the identity of the user using a combination of personal identification code based password checking and voice imprint matching. The voice imprint is any sufficiently long utterance or phrase that the user has previously entered into his/her profile. Each user""s voice imprint is analyzed and stored in the profile for quick matching on demand with a real-time provided user sample. The combination of every individual""s unique vocal characteristics stored in the voice imprint coupled with the random choice of the password phrase ensures a high degree of security and authentication.
Once authenticated, the user is allowed to navigate and access more information from the voice web using voice commands. In order to effectively accomplish this task, the voice web system retrieves the context independent command and control key word set from the user""s speech profile.
The voice web system then presents a top level voice web personal home page for user""s perusal. At the same time, it retrieves the set of word recognition patterns associated with the key words in the presented page from the user""s speech profile. Thus, the system is able to match the active vocabulary and associated speaker dependent word patterns dynamically in a context sensitive manner. The process continues as the user navigates from page to page. The voice web system dynamically retrieves the suitable subset of training word patterns from the user""s speech profile matching the voice navigation key words in the page being presented to the user.
The process described above greatly reduces the size of the training information that needs to be retrieved at any time while significantly enhancing accuracy of speech recognition using speaker dependent training profiles. Since the speech profile is constructed using HTML pages and components, it is universally accessible using its URL. This enables the user to call into any compatible Internet connected voice web system in user""s proximity from anywhere in the world, identify himself/herself to the system and then enable the system to dynamically retrieve suitable information that enhances his/her navigation and access of the information stored in the voice web using voice commands and input.
In addition to the user attribute information discussed above, the personal profile contains user preferences relative to configuration, presentation and information selection. These preferences are components within the personal profile pages and are easily available to the voice web system for dynamic retrieval. For example, if the user requests his/her stock portfolio from the voice web, it first retrieves the user""s preferred portfolio of companies from his/her profile and applies this list to limit the search on stock quotes from all companies. The user gets exactly the information relevant to his/her interest in exactly the order of priority he/she prefers.