As described in the above-referenced U.S. patent application Ser. No. 11/697,074, speech recognition is being used in a wide variety of applications, particularly including mobile device applications. Speech recognition applications that have emerged over the last few years include voice dialing (e.g., “Call home”), call routing (e.g., “I would like to make a collect call”), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), content-based spoken audio search (e.g., finding a podcast where particular words were spoken), and most recently, generation of text messages. A method disclosed in U.S. patent application Ser. No. 11/697,074 includes the steps of initializing a client device so that the client device is capable of communicating with a backend server; recording an audio message in the client device, transmitting the recorded audio message from the client device to a backend server through a client-server communication protocol, converting the transmitted audio message into the text message in the backend server, and sending the converted text message back to the client device. The text message comprises a Short Message Service (“SMS”) text message.
One common problem encountered in mobile device applications using speech recognition is the frequent presence of background noise, such as wind and other environmental noise, nearby speech and sounds, vehicular noise, and the like. For purposes of handling such background noise, the noise may be divided into at least two categories: simultaneous background noise (background noise that occurs while the user of the mobile device is speaking) and non-simultaneous background noise (background noise that occurs before or after the user of the mobile device is speaking).
Simultaneous background noise often requires the use of sophisticated filtering systems and the like, but non-simultaneous background noise is often excluded from treatment by a speech recognition system through the use of a “push to talk” button. The user presses and releases the button to indicate the beginning and end of speech they wish to have recognized by the speech recognition engine. Using this technique improves accuracy and because the engine does not have to attempt to detect start and end of speech.
On most cell phones and other mobile devices one or more physical buttons can be assigned to be used as the push to talk button. However, some mobile devices use touch screens as their only means of input and do not have any physical buttons which can be used for push to talk. For these mobile devices, an alternative to a physical button would be to use a visual on-screen button that the user can press. This is not the best solution though, as it requires the user to physically look at the screen in order to see where the push to talk button is located. One of the benefits of using speech recognition on a mobile device is that the user does not have to be physically looking at the device in order to enter text. Unfortunately, using an on-screen button negates that benefit and can be unsafe for the user if, for example, they want to use speech input while driving.
A growing number of mobile devices are provided or will be provided with means for detecting information about a physical phenomenon, such as proximity, ambient light, vibration, movement, inclination, or the like, that is encountered by the mobile device, and are utilizing this information to enter information into the device. In one example, some mobile devices may be provided with one or more proximity sensors to detect the proximity of a structural portion of the device (such as a slide or flip cover of a cell phone), the proximity of a user's face, ear or hand, or the like. Such proximity sensors may be infrared (IR), inductive, capacitive, magnetic, photoelectric, or the like. In another example, some mobile devices may be provided with one or more ambient light sensor to detect ambient light in a user's environment, to detect sudden light blockages caused by placement of the device against a user's face or hand, or the like. In another example, some mobile devices may be provided with a tilt sensor to detect inclination of mobile device with respect to at least one axis. Such a tilt sensor may incorporate any of a variety of devices, including but not limited to a gyroscope, accelerometer, multi-axis gyroscope, multi-axis accelerometer, or the like.
U.S. patent application Ser. No. 10/899,037, published as Pub. No. US2005/0197145 on Sep. 8, 2005 (the entirety of which is incorporated herein by reference), discloses a mobile phone having vibration/inclination detection means that allow a user to input a phone number without any manipulation of the keypad. The user enters the phone number one digit at a time by physically moving the phone in a predetermined pattern that corresponds to the desired digit. The vibration/inclination detection means are able to recognize numbers from 0-9, and the desired phone number may thus be derived from the detected movements and inclinations according to the predetermined rules.
It has also been proposed that a 3D accelerometer may be incorporated into a mobile phone, wherein the accelerometer recognizes movement of the mobile phone in three dimensional space, and the mobile phone carries out commands according to those calculations. In this arrangement, a user could physically move the mobile phone in a manner that draws a desired character in space, and the motion is monitored by the accelerometer, analyzed to determine the drawn character, and if recognized, the character may be displayed on the screen of the mobile phone.
U.S. patent application Ser. No. 11/649,885, published as Pub. No. US2007/0180718 on Aug. 9, 2007 (the entirety of which is incorporated herein by reference), discloses a method for pre-calibrating a mobile device such that subsequent movement, detected by a tilt sensor, can be prepared to a calibrated state to determine the relationship of the movement to a neutral position.
U.S. patent application Ser. No. 10/162,487, published as Pub. No. US2002/0167488 on Nov. 14, 2002 (the entirety of which is incorporated herein by reference), discloses the use, in a mobile device, of at least one sensor that provides contextual information to the device. The sensor may include a tilt sensor, a proximity sensor, or a gravity switch. When the mobile device receives an incoming message, or notification, the device responds thereto based at least in part upon the contextual information.
Unfortunately, none of these mobile devices use physical phenomenon devices to trigger the operation of a speech recognition system. Thus, a need exists for a mobile device that utilizes a physical phenomenon sensor, such as a tilt sensor, proximity sensor, or the like, to indicate when a user wants to begin and end recording audio for the speech recognition engine.