1. Field of Invention
This invention generally relates to data communications, and in particular to a two-way wireless communication device that utilizes network based speech recognition resources to augment the local user interface.
2. Discussion of Related Art
The use of hypertext based technologies has spread to the domain of wireless communication systems. Two-way wireless communication devices, also described as mobile devices herein, and wireless network protocols have been designed to permit interactive access to remote information services (e.g. commercial databases, email, on-line shopping), through a variety of wireless and wire-line networks, most notably the Internet and private networks.
Many mobile devices (e.g. cellular telephones) are mass-market consumer oriented-devices. Their user interface should thus be simple and easy to use without limiting the functionality of the device. Currently, the primary method of data entry for most mobile devices is a keypad that is relatively inefficient when used to input lengthy alphanumeric character strings. Due to size constraints and cost considerations, the keypads of these mobile devices are not a particularly user friendly interface for drafting messages requiring substantial user input (e.g. email messages). Keypads of this type usually have between 12 and 24 keys, a sufficient number for numeric inputs but very inefficient when dealing with the alphanumeric data entries required for network capable devices.
A user requesting information from the Internet generally navigates the World Wide Web using a browser. For example, a user requesting information on Stanford University using a search engine would have to input a search string which includes a Uniform Resource Locator (URL) of the search engine followed by xe2x80x9cStanford Universityxe2x80x9d.
The search string may include quite a few characters, in some cases over 40 characters. A user would have no problem inputting a string of this type using a standard desktop computer keyboard and browser (e.g. NETSCAPE or EXPLORER). However, the same user operating the keypad of a mobile device to input the same string would be severely hampered by the compact keypad and the close spacing between the keys.
One of the common uses of the Internet is email. A user who desires to send an email message having the size of the paragraph above would have to input over 400 characters. Using the standard keyboard of a desktop computer, a user may be able to input that number of characters in less than two minutes (assuming the user could type with an average degree of skill). Inputting the same number of keystrokes on the keypad of a mobile device could take considerably longer and become very tedious and prone to error.
Recent advances in speech recognition (SR) technology and increases in hardware capabilities are making the development of speech recognition based user interfaces for desktop systems commercially viable. SR technology takes spoken words and translates them into a format, which can easily be manipulated and displayed by digital systems. There have been efforts to equip compact mobile devices with SR technology, however, these efforts have generally required costly device modifications such as extra components (e.g. a DSP chip) or increased processing and storage capability. A typical cellular phone has computational resources equivalent to less than one percent of what is provided in a typical desktop or portable computer. A phone of this type running a scaled down SR application would only be able to recognize a small-predefined group of spoken words without modifying the device components.
Speech recognition software currently available for desktop and laptop computers (e.g. NATURALLY SPEAKING from Dragon System, Inc., PLAINTALK from Apple Computer, VIA VOICE from IBM and FREESPEECH from Philips Talk) are expensive and would represent a significant portion of the costs of a mobile device equipped with a comparable software application.
Placing a speech recognition software application in each mobile device and modifying its hardware components to run that application creates a financial disincentive for the handset manufacturers to incorporate SR features in their devices. These modifications would add considerable cost to the final price of the mobile device, possibly pricing them out of the target price range usually occupied by mass-market mobile devices (e.g. cellular telephones).
In terms of hardware resources, these applications can require up to 60 Mbytes of memory for each language supported. Additionally most of the commercially available speech recognition software applications are designed to function on systems having relatively fast processors.
There is thus a great need for apparatuses and methods that enable mobile devices to interact in a more efficient manner with digital computer networks. The ability to utilize speech recognition services in conjunction with the standard mobile device user interface (e.g. a phone keypad), without having to significantly modify hardware resources or costs, would dramatically improve the usability and commercial viability of network capable mobile devices having limited resources.
The present invention relates to a wireless communication system that utilizes a remote speech recognition server system to translate voice input received from mobile devices into a symbolic data file (e.g. alpha-numeric or control characters) that can be processed by the mobile devices. The translation process begins by establishing a voice communication channel between a mobile device and the speech recognition server. A user of the mobile device then begins speaking in a fashion that may be detected by the speech recognition server system. Upon detecting the user""s speech, the speech recognition server system translates the speech into a symbolic data file, which is then sent to the user through a separate data communication channel. The user, upon receiving the symbolic data file at the mobile device, reviews and edits the content of the symbolic data file and further utilizes the file as desired. For example a user could use the symbolic data file to fill in fields in an email or a browser request field.
The invention can be implemented in numerous ways, including as a method, an apparatus or device, a user interface, a computer readable memory and a system. Several embodiments of the invention are discussed below.
According to one embodiment, the present invention is a method for obtaining speech recognition services for a mobile device not having the resources and/or software for performing speech recognition processing locally. The method comprises using local applications resident within the mobile device to establish and coordinate a voice channel between the subject mobile device and a remote server system running a speech recognition application (referred to herein as a speech recognition server system).
Upon establishment of the voice channel the user of the subject mobile device is queued to begin speaking into the microphone of the mobile device (e.g. a cellular phone). Voiced input received at the speech recognition server system, as a result of this interaction, is converted into a symbolic data file. This process may be assisted by previously stored user specific data files. The symbolic data file is then sent back to the originating mobile device or a designated third party device through a separately established and coordinated data communication channel. The symbolic data file may be used to interact with local applications on the mobile device or to interact with network resources (e.g. servers on the Internet or a private network).
Other objects and advantages, together with the foregoing are attained in the exercise of the invention in the following description and accompanying drawings.