Voice commands are used in a variety of contexts to control electronic devices. Often, devices are capable of receiving spoken commands, performing speech processing to determine what the command is, and then responding to the command by performing the operation indicated by the command. In certain cases, it may be desirable to control a device with voice commands in this manner but nevertheless undesirable to include the hardware or software necessary for interpreting commands within the device itself. For example, speech processing software (i.e., software that processes audio signals to identify spoken words within the audio signal) can require significant memory capacity and can be processing intensive. Such processing and memory demands may undesirably increase the cost and size of an electronic device due to the need for increased memory and processing capability. Further, in a network of such devices, updating command processing software would require downloading new software to each device, which may be cumbersome and time consuming.
Another option would be to control a local device by processing command signals remotely. Remote processing of commands is known in a number of contexts. For example, many wireless telephones include a hands-free operation feature that allows the user to call a particular telephone number via a spoken command, e.g., “call home.” In this case, a local device (the wireless telephone) receives a voice command which is used by the network to establish a connection between the local device and a specified destination. The state of the local device is not controlled by such commands, but rather the state of the network is controlled.
Speech processing is also used with other mobile devices. For instance, a two-way wireless communication device can use network-based speech recognition resources to augment the local user interface. In particular, the wireless communication system uses a remote voice recognition server system to translate voice input to symbolic data that can be processed by mobile devices.
Further, it is well known to use spoken commands to interact via telephone with a menu-driven system or the like. For example, many automated phone answering systems present a caller with a list of options that may be selected by pressing a keypad button or by audibly stating a word that corresponds to the command. Likewise, current telephone directory assistance services generally ask the caller to say the state and city for which information is desired, and the response is processed by a speech recognition system prior to proceeding to the next question or prior to being connected to a live operator. In these examples, speech commands may be interpreted remotely; however, the commands are not used to control the local device, i.e., the telephone from which the call is being made.
It is also known to control devices, such as cable-supplied televisions, by calling a certain telephone number and entering commands via the telephone keypad to cause a remote processor to send control commands to the cable box controlling the television. Here, a local command is remotely processed to control a local device. However, since a telephone keypad is used, this scheme does not involve voice commands or a speech recognition processing.
Moreover, speech processing has been used with computer internet telephony. For instance, when making connections between network telephones on a computer network, computer network IP telephones are connected using a speech recognition engine and an IP address database on an internet server.
Another known use involves remote processing of locally captured speech to control a local visual display. In particular, human speech is used to control a visual display of a device such that the audio input is transmitted to a remote processor for speech recognition, visual update instructions are generated and sent to the local device, and the visual display of the local device is updated.
All of the foregoing examples lack the capability to provide security via the command system. Thus, for example, if an unauthorized person gains access to the portion of the system that receives voice commands, that person can instruct the system to perform unauthorized commands which may be harmful. There are a numerous circumstances in the government/military, commercial, and private sectors where it would be advantageous to control a local device with a locally-generated voice command without placing the demands of speech processing on a local device and with the capability to provide a significant degree of security in the process of interpreting and carrying out voice commands.