The present invention relates generally to voice portals. More particularly, the present invention relates to methods, systems, and computer program products for generating and providing efficient access to end-user-definable voice portals.
With the proliferation of the Internet and intranets, there has been an increasing demand for the creation of voice portals. As used herein, the term xe2x80x9cvoice portalxe2x80x9d refers to an audio interface that allows an end user to search and access information using primarily spoken commands. The information accessed through a voice portal may be delivered to the user in a variety of formats, including audio format.
One problem with voice portals is that unlike traditional Internet or Web portals, voice portals present information to the user in a serial, one-dimensional format. There is no ability to scan ahead. Thus, organizing information into topics with specific sources and content therein becomes important.
Scanning a Web page visually is a two-dimensional phenomenon in that the surface of a computer monitor or similar display device is two-dimensional and can be visually scanned in both the horizontal and vertical directions with great visual and comprehensive efficiency. A user can use visual perception to look for underlined text or text of a different color. The user can access this text by clicking on the text using an input device while perceptually discarding and paying no attention to all the other information on a visual Web page. In contrast, there is no such analogy in delivering HTML or similar information to a user in audible format only. When the user is listening to information, the user does not know if a xe2x80x9clinkxe2x80x9d is coming up or if the user desires or does not desire to hear the upcoming information. Thus, a problem with conventional voice portals is an efficient way to allow users to create such portals and efficiently navigate audible information using such portals.
One particular problem encountered in serially oriented content delivery, such as the audible delivery from a voice portal, is that of searching. If the user submits the term xe2x80x9ccardiologyxe2x80x9d arbitrarily to a search engine driving a voice portal that utilizes the entire Internet as a data source, the number of xe2x80x9chitsxe2x80x9d would preclude the efficient and practical navigation of the numerous results in an audible manner. Thus, there exists a need for the simplification of full-text searching in the context of voice portal audible navigation and delivery.
Furthermore, voice portals today, by their method of implementation, are limited to a offering a finite or fixed vocabulary or grammar, meaning that a typical voice portal may have a finite number of words, e.g., 50, 100, or 500 words, that the voice portal understands and that the entire user base shares. These vocabularies are not extensible; i.e., the user cannot define arbitrary words or grammar. In addition, the user cannot define arbitrary links to the source of information that is to be associated with the user-defined words. Accordingly, another problem with conventional voice portals is to provide a solution for creating extensible voice portals that can be rapidly created and that are unique from one user to the next.
The need for rapid creation of user-defined grammar and source data associations can be thought of as analogous to the need for bookmarks in a visual Web environment. A bookmark is an association between user-defined text and a Web page that can be rapidly created using a conventional Web browser. Each user typically has his or her own set of bookmarks. A corresponding need exists in the voice portal environment whereby a user can define a word or utterance and rapidly associate this word or utterance with information accessible via a voice portal. The action of speaking the word would be analogous to the Web action of selecting a bookmark.
While voice portals are a relatively new phenomenon, there are currently three solutions emerging in the marketplace to the problem of creating an operational voice portal. Each of these problems is briefly discussed below with examples. The discussion of each solution is divided into two parts: what the voice portal solution offers and how one goes about implementing it.
1. The first solutions are those that merely offer pre-defined keyword choices and predefined categories of services (restaurants, sports, stocks, etc.) to the end user. An example of one such solution can be accessed through www.tellme.com or 1-800-555-TELL. These voice portal solutions are provided by service-oriented companies (as opposed to product-oriented companies). The service these companies offer is a voice portal implemented typically as a toll-free phone number which the user dials, and then is requested to choose between predetermined categories of information such as stock reports, weather reports, etc. The user then utters various predefined commands to access the standard services. For example, the user might say xe2x80x9cweatherxe2x80x9d and then receive, in audible format via a telephone handset, the forecast for the user""s geographic area. The user might say xe2x80x9cCxe2x80x9d xe2x80x9cSxe2x80x9d xe2x80x9cCxe2x80x9d xe2x80x9cOxe2x80x9d to receive the current stock quote for the company Cisco Systems whose symbol is xe2x80x9cCSCO.xe2x80x9d It must be emphasized that the company or service provider chooses the words that will be available to the user, such as xe2x80x9cweatherxe2x80x9d, xe2x80x9cstocksxe2x80x9d, or the English alphabet xe2x80x9cAxe2x80x9d through xe2x80x9cZxe2x80x9d, for the purpose of inputting specific stocks. At no time is the user allowed to define his or her own words such as xe2x80x9ccardiologyxe2x80x9d or the data with which a word or series of words is associated.
2. System integration companies that can add or integrate a voice portal solution for a client company for a fee represent the second category of solutions. These companies leverage off-the-shelf software/hardware and their own systems integration expert personnel with much xe2x80x9cknow howxe2x80x9d on how to put it together. A good example of a system integration company offering the ability to implement a voice portal for a client is Nortel Networks. The difference between this type of solution and the first solution (1) described above is that this solution gathers the client""s customer requirements in the form of the words that the client desires to be part of the voice portal system vocabulary, builds a system to this specification, and then hosts the voice portal solution for a monthly fee. In this solution, the client company can add a word like xe2x80x9ccardiologyxe2x80x9d to the existing vocabulary by making a formal request to Nortel Networks. Nortel Networks then uses its experts to add this to the voice portal system vocabulary. After the word has been added, a user can access the voice portal speak via a telephone, speak the new word, such as xe2x80x9ccardiologyxe2x80x9d, and access the information provisioned for that word. However, like solution (1) above, there is no easy solution for end users to define their own words or data associations once the system is up and running. Rather, the addition of new words to the voice portal vocabulary is done through an expensive and time-consuming process of the client company utilizing the expertise of the system integrator under contract to implement, enhance, and host the client company""s voice portal. Expertise in VXML and database languages would be essential.
3 . A third group of voice portal solutions is provided by system integration companies that go into a client company and add/integrate a voice portal solution for that company for a fee. These companies also have commercial tools they have developed to greatly speed the time of implementation of voice portals. These companies differ from the (2) vendors above in that these tools allow rapid customization at the time of implementation by leveraging a custom tool suite. However, once the customization is done, it is xe2x80x9ccast in stonexe2x80x9d so-to-speak in that whatever the system is at the time of implementation, this is all of the functionality the users will get when they use the system. In other words, like solutions (1) and (2) above, at no time are the users themselves allowed to define their own words and/or data associations. The difference between solution (3) and solutions (1) and (2) is that the system integration company responsible for implementing the voice portal in solution (3) can more quickly add additional words, but at best this might take days.
The best that can be offered by the above three solutions is to receive information from the user and to provide predefined categories of information using predefined categories. For example, a conventional voice portal system may receive geographic information from the user and provide restaurant choices, or weather for the city in which the user lives. Another example of a predefined service is the ability to obtain stock quotes using predefined vocabulary words for each company. Some customization is possible. For example, the user may be able to access the stocks in his or her portfolio using a predefined xe2x80x9cfavoritesxe2x80x9d keyword provided by the system. However, it is very important to understand that these systems only allow the user to customize within the fixed, finite, static and unchanging vocabulary of words programmed into the voice portal at the time it was implemented. For example, in order to provide access to stock quotes, most voice portal systems have twenty-six xe2x80x9cwordsxe2x80x9d that the system understands, which are in fact the letters of the English alphabet xe2x80x9cAxe2x80x9d through xe2x80x9cZ.xe2x80x9d This allows the user to articulate symbols for all stock market stocks by speaking the individual letters. The user can even set up a portfolio of stocks to monitor using these predefined letters. However, the user can never add his or her own word, such as xe2x80x9cairplanexe2x80x9d or xe2x80x9ccardiology.xe2x80x9d Conventional voice portal systems only allow end user vocabulary or data customization using a limited set of predefined choices. For example, if a user wants to continuously check medical news from a predetermined source, such as the New England Journal of Medicine, within a predetermined time interval (e.g., the last three months), there is no way for the end user to xe2x80x9ctellxe2x80x9d or xe2x80x9cprogramxe2x80x9d conventional voice portal systems to do this. Either the capability was put into the voice portal system by the systems integrator when the voice portal was implemented or last updated, or it was not. Thus, there exists a need for voice portal systems that facilitate end user generation and modification of the voice portal after the voice portal has been established.
Another problem that exists with conventional voice portal is non-homogeneity. For example, a user may utilize a conventional voice portal, such as the voice portal offered by www.tellme.com, to access his or her stocks using the predefined vocabulary word xe2x80x9cstock quotesxe2x80x9d from the grammar recognized by that voice portal. On a different voice portal, such as the voice portal offered by www.heyAnita.com, the user may access stock quotes using a different predefined vocabulary word, such as xe2x80x9cstocksxe2x80x9d, from the grammar recognized by that voice portal. Such non-homogeneity can make remembering the different predefined grammar sets to use when accessing different voice portals difficult. Accordingly, there exists a need for methods and systems for providing homogeneity among voice portal grammar sets.
The present invention solution overcomes these limitations and difficulties associated with conventional voice portals. Specifically, the present invention overcomes two major problems of conventional solutions that prevent voice portals from being efficient and extensible in a manner analogous to Web portals:
(a) Conventional voice portal solutions have finite vocabularies and data sets that are not user-definable and/or user-extensible; and
(b) The effort to add additional vocabulary to conventional voice portal solutions is typically done by system integrators with great time and expense because of complex coding and database interaction required.
A conventional voice portal solution is simply a fixed, finite, static and unchanging vocabulary of words programmed into the voice portal at the time it was implemented with corresponding fixed, finite, static and unchanging links to sources of information. The links remain static. The information is the only part of the system that changes. The present invention solves these problems of conventional voice portals by providing a voice portal with the following features:
1. The ability to bridge the gap between a conventional web browser and the strictly audio world;
2. The ability to allow precise searching; and
3. The ability for the user to refer to the most complex search with a single spoken word, or audio macro with extremely high accuracy.
The present invention differs in operation from a conventional voice portal by a unique organization of programs and data. Specifically the present invention provides a unique data structure, which will be referred to herein as a xe2x80x9ctopic templatexe2x80x9d, which includes but is not limited to primarily a user-defined word or xe2x80x9ctopicxe2x80x9d, user-defined associated source information or xe2x80x9ctopic couplingxe2x80x9d, user-defined temporal information or xe2x80x9ctopic temporal informationxe2x80x9d, and user-defined action via a user-specified audio macro. The present invention allows the creation of this topic template in real-time. The present invention then enables access to the topic template via a graphic/voice user interface (G/VUI) to the topic radio tuner where the audio macros (grammar for that session) are interpreted into their corresponding topic actions yielding:
1. A system and method for the real-time end user creation of a voice portal;
2. A system and method for the real-time creation of a unique voice portal per user in a multi-user environment; and
3. A system and method whereby the construct of a xe2x80x9ctopicxe2x80x9d is used to enable a more efficient voice portal navigation paradigm as compared to conventional voice portal implementations.
The present invention achieves these goals via a two-pass approach whose base construct is user-defined topics. The first pass is the creation of user-definable voice portals. The second pass is access to information using the user-defined voice portals and real-time end user modification of the end user""s personal voice portal.
According to another aspect, the present invention includes a software interface, which will be referred to herein a personal broadcast editor, which enables end user creation and modification of voice portals. There are three primary areas where the functionality of the personal broadcast editor differs greatly from conventional voice portals:
1. The ability for the user to define in real-time an arbitrary vocabulary word or phrase, that is referred to herein as an audio macro. The set of audio macros for a particular user define the grammar that must be interpreted by the speech recognition engine when providing a voice portal to that user.
2. The ability to couple the vocabulary word or audio macro to an arbitrary content source in real-time via Web technologies
3. The ability to define in real-time the enunciation of the topic (i.e. one user may say xe2x80x9ctom-A-toxe2x80x9d . . . another user may say xe2x80x9ctom-ah-toxe2x80x9d, etc.).
Personal Broadcasting Suite
The personal broadcasting suite provides a common interface for generating a personal voice portal for each listener (user). The resulting voice portal provides a common yet customized presentation to listeners. It is comprised of a login module, personal broadcast editor, universal Interface (G/VUIxe2x80x94Graphical/Voice User Interface), and a topic radio tuner.
The personal broadcasting suite is a 2-pass system, but may include more than two passes. The personal broadcasting suite begins with setting up general registration, keywords, audio macros and a template. The second pass, when a listener accesses their unique portal through the topic radio tuner, delivers the requested personalized information in a device independent manner (phone, hands-free mobile phone, telematics, etc.).
Audio Macro
Audio macros are the words the listener assigns to access either a specific source or piece of information or multiple sources or pieces of information with due regard to the associated reference source, full-text search, and temporal information. Audio macros are similar to bookmarks in a 2D environment and assigned by the listener to access either a specific source or a piece of information or multiple sources of information with due regard to the associated temporal information. They are not finite and static as in other systems, but may be dynamic and infinite. The audio macros create a unique grammar set for that listener and the template in use. Again, grammar is the set of vocabulary words or audio macros that speech recognition hardware and software must recognize for a particular user. The system offers dynamic grammar loading so that the task of speech recognition is much easier than fixed grammar sets. Dynamic grammar loading refers to the loading of all of the particular audio macros for a particular user for a particular session. It is also easier for the user to remember his/her own macros and it is easier for the speech recognition technology to use a smaller set of unique grammar.
Login Module
The login module has a registration component for first time listeners to provide basic set-up information. Accessed via any suitable user interface, such as the Internet for sophisticated computer users or an 800 number for those with out computer/internet familiarity, the module acts as a conduit for collecting data such as name, address, phone number, PIN number, voice authentication, fingerprint or other ID number, email addresses (home, work, groups), fax number, GPS, and other basic information. Once the user is registered, the user may just log in.
Personal Broadcast Editor
The personal broadcast editor is a template-based approach to allow the listener to dynamically define in real-time the following:
Define/Edit targets (sources) of information;
Define/Edit actions including full text searching in the temporal domain;
Establish a link between the targets and actions; and
Define/Edit audio macros and pronunciation if required;
When accessed via an 800 number, a trained operator simply executes the personal broadcast editor to reflect the users requests. The output of the personal broadcast editor is a custom voice portal unique to that user.
Topic Radio Tuner
The topic radio tuner is the engine that begins to operate when the user calls in to access their personal voice portal. It is the actual navigational structure the user utilizes to retrieve topics via the audio macros they have previously assigned. The topic radio tuner facilitates one of the personal broadcasting suite""s most valuable and unique attributes. In conventional systems where keywords are generic for all listeners, a specific topic radio audio macro is unique to that listener and the information set retrieved.
For example, in other systems, all the keywords and topics are known in advance and finite. In those systems, the listener has no ability to add the word xe2x80x9cgizmoxe2x80x9d in some profile/template and then speak it later. In our system they can, and furthermore they can in multiple languages (English, Spanish, etc.).
G/VUI Universal Interface
The G/VUI universal interface is a graphic user interface (GUI) coupled with a Voice User Interface (VUI) and provides a means to access the topic radio tuner and begin accessing your information. The term xe2x80x9cuniversalxe2x80x9d is used because it can be used through an suitable user interfaces device, such as a computer, a mobile phone, a landline phone, interactive television, or a VoIP device, and provides a standardized methodology for using new and evolving devices. The G/VUI can also provide a method for a user to define a user-specified homogeneous audio macro for accessing other voice portals with different grammar sets.
Accordingly, it is an object of the invention to overcome at least some of the difficulties associated with conventional voice portals and voice portal creation techniques.
An object of the invention having been stated hereinabove, other objects will become evident as the description proceeds when taken in connection with the accompanying drawings as best described hereinbelow.