The tremendous growth of the Internet over the years demonstrates that users value the convenience of being able to access the wealth of information available online and that portion of the Internet comprising the World Wide Web (WWW). The Internet has proven to be an easy and effective way to deliver services such as banking etc. to multitudes of computer users. Accordingly, Internet content and the number of services provided thereon have increased dramatically and is projected to continue to do so for many years. As the Internet becomes increasingly prevalent throughout the world, more and more people are coming to rely on the medium as a necessary part of their daily lives. Presently, the majority of people typically access the Internet with a personal computer using a browser such as Netscape Navigator™ or Microsoft Internet Explorer™. One disadvantage with this paradigm is that the desktop user is typically physically “wired” to the Internet thereby rendering the users' experience stationary.
Another industry that is experiencing rapid growth is in the area of mobile telephony. The number of mobile users is expected to grow substantially and, by many estimates will, if not already, outnumber the users of the traditional Internet. The large numbers of current and projected mobile subscribers has created a desire to bring the benefits of the Internet to the mobile world. Such benefits include being able to access the content now readily available on the Internet in addition to the ability to access a multitude of services available such as e.g. banking, placing stock trades, making airline reservations, and shopping etc. A further impetus arrives in the fact that adding to the attraction of providing such services is not lost on the mobile operators since significant potential revenues may be gained from the introduction of a whole host of new value-added services.
Operating in a wireless environment poses a number of constraints when bringing services to mobile subscribers as compared to the desktop experience. By way of example, mobile clients typically operate in low-bandwidth environments where there are typically limited amounts of spectral resources available for data transmission. It should be noted that use of the term mobile clients herein may include portable devices such as e.g. mobile phones, handheld devices such as personal digital assistants (PDAs), and communicator devices such as the Nokia 9110 and its successors etc. The low-bandwidth constraint renders traditional Internet browsing to be far too data intensive to be suitable for use with mobile clients and therefore alternative access solutions have been proposed.
One proposed solution to link the Internet for seamless viewing and use with mobile clients is Wireless Application Protocol (WAP). WAP is an open standard for mobile clients that, although being similar in operation to the well-known Internet technology, is optimized to meet the constraints of the wireless environment. This is achieved, among other things, by using a type of binary data transmission to optimize for long latency and low bandwidth in the form of wireless markup language (WML) and WML script. WML and WML script are optimized for use in hand-held mobile clients for producing and viewing WAP content and are analogous to the Hypertext Markup Language (HTML) and Java script used for producing and displaying content on the WWW.
FIG. 1 shows the basic architecture of a typical WAP service model which allows content to be hosted on WWW origin servers or WAP servers that are available for wireless retrieval by the client. By way of example, a WAP compliant client 100 containing a relatively simple built-in micro-browser is able to access the Internet via a WAP gateway 120 installed in a mobile phone network, for example. To access content from the WWW, a WAP client 100 may make a wireless WML request 110 to the WAP gateway 120 by specifying an uniform resource locator (URL) via transmission link 130 on an Internet origin server 140. A URL uniquely identifies a resource, e.g., a Web page or a document on an Internet server that can be retrieved by using standard Internet Protocol (IP). The WAP gateway 120 then retrieves the content from the server 140 via transmission 150 that is preferably prepared in WML format, which is optimized for use with WAP clients. If the content is only available in HTML format, the WAP gateway 120 may attempt to translate it into WML, which is then sent on to the WAP client 100 via wireless transmission 160 in such way that it is independent of the mobile operating standard. For a more complete description of WAP architecture and the WAP environment the interested reader may refer to “Wireless Application Protocol Architecture Specification”, WAP Forum, Apr. 30, 1998. URL: http://www.wapforum.org/what/technical.htm and “Wireless Application Environment Overview”, WAP-195-WAEOverview, Version Mar. 29, 2000, WAP Forum.
FIG. 2 shows the fundamental protocol stack used in the WAP architecture. The protocol stack is comprised of various hierarchical protocol layers that comprise rules that govern traffic and behavior in data transmission. The uppermost layer WAE 200 (Wireless Application Environment) represents a broad application environment depicting the functional operation of services and applications operating at the application level, as shown by reference numeral 205. Below the WAE layer 200 in the hierarchy is the WSP layer 210 (Wireless Session Protocol), which comprises session-related services connected with making browser application requests, for example. The WTP 215 (Wireless Transaction Protocol) layer is involved in operations for reliable data transmission such as interactive browsing, for example. The WTLS layer 220 (Wireless Transport Layer Security) contains optional services that are associated with the security of data transmissions and which may optionally be used by various applications.
The lowermost protocol layer in the WAP protocol stack is the WDP layer 225 (Wireless Datagram Protocol) which operates above the bearers intended for information transmission in a particular network. WDP provides a common interface to the upper protocol layers such that they are able to operate independently of the underlying network. Such networks may include those operating in accordance with the Global System for Mobile Communication (GSM), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Wideband Code Division Multiple Access (WCDMA), for example, and are depicted by reference numeral 230. Moreover, bearers of this kind may include short messages (SMS, Short Message Services), data calls (CSD, Circuit Switched Data), packet radio services such as GPRS (General Packet Radio Service), for example.
The WAP system as discussed up until now only describes the retrieval text-based WML content. The next generation of advanced WAP systems will be capable of retrieving multi-modal content that includes sound and video streaming, in addition to text and images, in order to provide sophisticated voice-based and multimedia services. In addition, navigating through content will likely be performed by non-physical interactive techniques such as voice browsing in lieu of the cumbersome method of pressing keypad buttons. Voice browsing techniques require the ability to automatically recognize speech uttered by the user. Automatic speech recognition functioning with the system identifies speech and interprets an associated command for execution of page navigation or selection of links while browsing a Web page, for example. As known to those skilled in the art, mobile phones have typically employed a form of speech recognition in connection with voice dialing whereby users can, for example, say the name of the person they want to call which the phone recognizes and automatically dials the correct number.
As bit rates are increased for advanced generation wireless systems, such as those proposed for use with high bit-rate third generation (3G) systems such as Universal Mobile Telephone Service (UMTS) or even lower bit-rate systems such as High Speed Circuit Switched Data (HSCSD) and GPRS, it will become feasible for mobile users to browse the Internet in a way that approaches that of traditional wireline browsing. This, together with improvements in the WAP architecture, will enable multi-modal content to be accessible for retrieval and playback on mobile clients. This is not possible with the current WAP systems since they, as mentioned, are text-based and do not employ multi-modal capabilities. A further obstacle is that there currently is no established standard in WAP for authoring in multi-modal content.
On the Internet, streaming media is typically transferred or streamed to the receiving computer by a communications protocol known as UDP (User Datagram Protocol). Since IP (Internet Protocol) is packet-based, the packets are transferred in units known as datagrams. As known by those skilled in the art, UDP is a ‘connectionless’ protocol which uses IP to transmit datagrams while not making sure that all the packets reach their destination. This makes UDP ideal for use in applications where it is not essential for all of the packets to arrive, such as streaming sound files where the occasional lost packets do not make a noticeable difference to the listener.
In the current version of WAP it is possible to use the WDP layer 225 in the protocol stack as a transport mechanism for sound data but there are some disadvantages with this approach, especially when used with automatic speech recognition. A major disadvantage is that it is difficult to ensure absolute security when routing sound data through the WAP gateway. This is because the primary processing for speech recognition in wireless networks can be performed by a separate speech recognition server (SRS) functioning together with the network in what is referred to as a distributed speech recognition system (DSR). DSR is implemented because speech recognition is often too heavy a task to be performed entirely in many mobile clients. This is because speech processing requires a relatively high level of processing power and is memory intensive, especially when implementing multi-language support typically found on many phones today.
Using speech recognition in the WAP environment in the above manner presents risks that are inherent in the routing mechanism. Security concerns may be justified when touting the speech to the SRS for processing via the WAP gateway 120. This can occur when the client encrypts the speech by using the WTLS layer 220 (Wireless Transport Layer Security) in the protocol stack and sends it over a wireless channel to the WAP gateway as the protocol demands. In the gateway it will likely need to be decrypted in order to be sent to and processed by the SRS, where it is then re-encrypted in the gateway and sent on its way. The decryption performed in the gateway leaves the data exposed to a third party (e.g. the network operator) which users may be uncomfortable with particularly when performing sensitive activities such as banking services, for example.
In view of the foregoing, an improved architecture is needed that enables mobile clients to successfully use automatic speech recognition in voice-based interactive applications in a secure manner that requires relatively little modification to existing infrastructures.