The present invention relates to a voice browser and a method for interpreting and responding to Dual Tone MultiFrequency (DTMF) tones received from a telecommunications network, which DTMF tones are transmitted by a user for controlling the operation of the voice browser when information published on a data packet switched network, such as the Internet, is accessed.
The World Wide Web (WWW or Web for short) is today the most utilised Internet application. The Web consists of millions of Web pages and the number of accessible Web pages is continuously growing. An Internet user accesses a Web page using a browser. A conventional browser provides a textual and graphical user interface, which aids the user in requesting and displaying information from the Web. A conventional browser is typically a software program running on a computer, for example a personal computer. Thus, a user needs some sort of computer hardware on which browser software can be executed in order to retrieve and display information published as Web pages.
More and more companies use the Web as an information channel to their customers and/or as a way to provide services to their customers. Such companies are, for example, consumer product companies, insurance companies, banks, employment agencies etc., but also public authorities, which publish information and services relating to shopping, news, employment, education, and so on. A problem with all these web pages provided by different companies and authorities is that they are only accessible by people having a computer on which a graphical browser can be executed. Even if a user has access to a computer he needs to be connected to the Internet. In addition, people with poor reading skills or with vision problems will have difficulties in reading text-based Web pages.
For the above reasons, the research community has developed browsers for non-visual access to Web pages, or WWW content, for users that wish to access the information or services through a telephone. The non-visual browsers, or voice browsers, present audio output to a user by conversion of text of Web pages, such as HTML pages, to speech and by playing pre-recorded Web audio files from the Web. A voice browser furthermore implements the functionality needed to allow a user to navigate between Web pages, i.e. follow hyptertext links, as well as navigate within Web pages, i.e. to step backward and forward within the page. Other functions that can be provided to the user is the possibility to pause and resume the audio output, go to a start page and choose from a number of pre-defined bookmarks or favourites. Some voice browsers are implemented on PCs or Work-stations and allow the user to access the browser functions using commands inserted with a computer keyboard, while others are accessed using a telephone. When accessing a voice browser with a telephone, one or several browser commands can be sent by the user by way of using DTMF signals, which are generated with one or several keystrokes on a keypad of the telephone.
Another way to allow a user of a telephone access to a database or the like is to provide an Interactive Voice Response (IVR) system. Conventional IVR systems usually allow a user to interact directly with the application by way of transmitting DTMF signals to the system and the application. For example, the most common way of enabling a user to select between a number of choices in an IVR system is to have a menu read to the user and to allow the user to select a certain item from the menu by producing a corresponding DTMF signal. In a similar way, for certain applications that are accessed on the Internet using a voice browser, there is a need for the application to be able to receive commands directly from a user without any interference from the browser. Such direct access to keys on a keyboard sometimes in the literature referred to as xe2x80x9caccess keysxe2x80x9d. With the notation of an access key in HTML, an application could assign a key to be directly attached to the application. The action to be performed in response to a signal from such a key would then be defined by the application.
Thus, conventional techniques either use DTMF tones for controlling the browser functionality only, which is the case for known voice browsers, or for application control only, which is the case for known IVR systems. There is a problem in how to design a voice browser which in an efficient manner can simultaneously handle DTMF tones relating to browser functionality control as well as tones relating to the control of a current accessed application, especially since the number of keys of a telephone keypad generally is limited to 12 keys.
Another problem with voice browser systems is how to design a voice browser in which a currently accessed part of an HTML page is in synchronism with a set of current and relevant operations, or voice browser functions, that are possible to perform in response to received and interpreted DTMF tones.
An object of the present invention is to provide a solution for how a voice browser which is controlled through a DTMF tone interface can provide access to commands for controlling the voice browser as well as, at the same time, commands for controlling an application being separate from the voice browser and accessed from the voice browser through a data packet switched network.
Another object of the invention is to provide a voice browser system with a user friendly interface that enables a user to access the most important functions supported by the voice browser and by an application being accessed by the voice browser using only one key stroke.
Yet another object of the invention is to provide a voice browser having a mechanism that ensures that the process of accessing a certain part of an HTML page is in synchronism with a corresponding set of potential operations, or voice browser functions, that are possible to perform, for the particular HTML page part, in response to received and interpreted DTMF tones.
According to the present invention, these objects are achieved by an arrangement and a method having the features as defined in the appended claims.
According to a first aspect of the invention, there is provided a voice browser in a voice browser system, said voice browser being arranged at a server connected to the Internet and responsive to Dual Tone Multi-Frequency (DTMF) tones received from a telecommunications network, wherein said voice browser includes: an object model comprising elements defined in a retrieved HTML page and defining navigation positions within said HTML page; audio means for playing an audio stream derived from an element of said HTML page; a voice browser controller for controlling the operation of said voice browser; and a dialogue state structure, having a plurality of states and transitions between states, storing text and audio objects to be outputted to said audio means; and a dialogue controller arranged to control a dialogue with a user based on said dialogue state structure and to response to an interpreted DTMF tone with an event to said voice browser controller, wherein said voice browser controller, in response to an event including an interpreted DTMF tone of a first predetermined set of interpreted DTMF tones, is arranged to control a voice browser function associated with said interpreted DTMF tone and to control from which state in said dialogue state structure, or in a second dialogue state structure associated with a second retrieved HTML page, and dialogue should resume after an execution of said function; said voice browser controller, in response to an event including an interpreted DTMF tone of a second predetermined set of interpreted DTMF tones, is arranged to direct said interpreted DTMF tone to an application of said retrieved HTML page; each of said states is associated with a corresponding position in said object mode; and said voice browser further includes synchronisation means for synchronising said dialogue, with respect to a current state, with a position in said object model.
According to a second aspect of the invention, there is provided a method at a voice browser in a voice browser system, said voice browser being arranged at a server connected to the Internet and responsive to Dual Tone MultiFrequency (DTMF) tones received from a telecommunications network, said method comprising the steps of: retrieving an HTML page in response to a DTMF tone interpretation; creating an object model comprising the elements defined in said HTML page; deriving a number of states, each of said states including a reference to a position in said object model and at least one input and/or at least one output; creating a dialogue state structure associated with said object model in which structure each state from said deriving step is incorporated; executing a dialogue with a user based on said dialogue state structure; responding to an interpreted DTMF tone received in a state in said dialogue state structure with an event to a voice browser controller; controlling, at said voice browser controller in response to said event, if the event includes an interpreted DTMF tone of a first predetermined set of interpreted DTMF tones, a voice browser function associated with said interpreted DTMF tone and from which state in said dialogue state structure, or in a second dialogue state structure associated with a second retrieved HTML page, said dialogue should resume after an execution of said function; directing, from said voice browser controller in response to said event, if the event includes an interpreted DTMF tone of a second predetermined set of interpreted DTMF tones, the interpreted DTMF tone to an application of said HTML page; and synchronising said dialogue state structure, with respect to a current state, with a new position in said object model.
The voice browser according to the present invention is part of a voice browser system, which system also comprises at least one telephone connected to the voice browser via a telecommunications network. The voice browser is arranged to access information published as Hyper-Text Mark-up Language (HTML) files, i.e. as HTML pages, or as any other Mark-up Language files, on the Internet, or on any other data packet switched network. A telephone is used by an end user for controlling the functionality supported by the voice browser by means of transmitting DTMF tones over the telecommunications network during a dialogue between the user and the voice browser.
The telecommunications network is any kind of network on which a voice communication and DTMF tones can be transferred, such as fixed circuit switched network, a mobile communications network or a packet switched network. As implied by the latter case, the network could very well be the Internet in which case the voice browser is accessed using Internet telephone or by means of an Internet access via a mobile station and a General Packet Radio Service (GPRS) of a GSM network. Of course, the kind of telephone equipment used will be dependant upon the kind of telecommunications network chosen for accessing the voice browser, however, the telephone equipment needs to have a keypad and to be able to generate DTMF signals.
According to the invention, it is possible to control both the voice browser functionality and an application of an HTML page simultaneously from a telephone keypad using a first set of DTMF tones and a second set of DTMF tones, respectively. Each DTMF tone of these sets is generated by a user with a single keystroke on the keypad and interpreted as a certain key by the voice browser. The interpretation is transferred in an event from the dialogue state controller to the voice browser controller which performs the necessary operations relating to browser functionality or application control, the operations being dependent upon which DTMF interpretation that was received in the event.
Preferably, each state of the dialogue state structure includes a reference to a corresponding position, or node, in a parse tree that constitutes the object model. In a certain state, as a DTMF tone is received and interpreted, the key interpretation of the DTMF tone and the reference stored by the state in question, is transferred in an event, or call-back, to the voice browser controller. Thus, the object model will always be synchronised with the dialogue state structure, which means that the voice browser controller always will perform the operations associated with a specific key on the relevant part of the HTML page in accordance with the reference to the object model.
Similarly, certain positions, or nodes, of the object model are associated with references to states in the dialogue state structure in order to synchronise the dialogue structure with the object model. This synchronisation is preferably achieved by means of a look-up table, but could be accomplished using any kind of data-base means. A specific position of the object model has a corresponding entry in the look-up table and each entry in the look-up table stores a reference to an appropriate state in the dialogue state structure. Thus, after the voice browser controller has processed a received event, the resulting position, which could be the same or a new position depending on what operation the event triggered, will be means of the look-up table refer to a corresponding state in the dialogue structure. This referred state indicates the state in the dialogue state structure from which the dialogue with the user should be resumed. Hence, the following operation on the dialogue state structure, due to a received DTMF tone, will be synchronised with the current position in the object model and, thus, the currently browsed part of the HTML page.
Thus, the voice browser according to the invention ensures that synchronism is always maintained between the layout of the original HTML page retrieved by the browser and all the possible control functions offered to a user via a DTMF interface.
Preferably, an event transferred from the dialogue state structure to the voice browser controller also includes a time stamp derived from the standard clock function of the server in which the voice browser is executing. The voice browser controller uses this time stamp when performing certain operations relating to the navigation within an HTML page. These operations include those that control the browser functionality regarding moving back and forward in the object model created from the HTML page, and, thus, the audio output to the telecommunications network.
When referring to elements of an HTML page, or file, in the context of the present invention, this includes browsable text paragraphs included in the page, hypertext links, audio files referenced by the page, or other items that are suitable for audio output, either directly or after a text-to-speech conversion.
The above mentioned and further aspects and features of, as well as advantages with, the present invention, will be more fully understood from the following description, with reference to the accompanying drawings, of an exemplifying embodiment thereof.