Providing human-machine control information to unmanned vehicles (UVs) or robots is typically unnatural and too cumbersome depending on the situation or type of user input interface (for example, joystick, keyboard, switches, etc.). Also, specific training is usually required. In recent years, voice based human-machine interface technologies to control UVs and robots have shown significant progress, despite the fact that some solutions are still naive.
In a traditional voice controlled UV or robot, the system utilizes automatic speech recognition (ASR) to recognize voice commands uttered by the human operator and to generate the corresponding machine command. The corresponding machine command is generated by the steps of receiving the recognized voice command, interpreting the voice command, and generating a control signal or machine command suitable for transmission to the remotely UV or robot. In most voice-controlled UVs or robots, ASR is employed as a “black box” that receives a speech signal and delivers the recognized text string with little concern on the potentials for error or for limitations of the ASR technology. Some innovations include the use of special microphones to reduce the effect of noise, noise cancelling methods, and natural language post-processing. However, the integration of ASR technology to the UV or robot technology and operation has been minimally explored and implemented.
In JP Publication 2013-128287, entitled WIRELESSLY CONTROLLING UNMANNED AIRCRAFT AND ACCESSING ASSOCIATED SURVEILLANCE DATA, the control of “an unmanned aerial vehicle (UAV) may be accomplished by using a wireless device (e.g., cell phone) to send a control message to a receiver at the UAV via a wireless telecommunication network (e.g., an existing cellular network configured solely for mobile telephone communication). In addition, the wireless device may be used to receive communications from a transmitter at the UAV, where the wireless device receives the communications from the transmitter via the wireless network. Examples of such communications include surveillance information and UAV monitoring information.”
In Korean patent publication KR101330991, entitled VOICE RELAY SYSTEM FOR UNMANNED AERIAL VEHICLE USING PTT SIGNAL, a voice relay system for an unmanned aircraft using PTT signals is proposed. This invention relays voice communication between a wireless controller and an aircontrol center through an unmanned aircraft and receives the voice of a pilot.
In Korean patent publication KR20110135070, entitled VOICE CONTROL COMMUNICATION SYSTEM AND METHOD OF UNMANNED AERIAL VEHICLE FOR ANTICOLLISION AND DISTINGUISH POSITION, “a voice ground control system and method thereof for UAV anti-collision and positioning are provided to easily receive an aviation instruction to control of a controller by performing communication between an ACC and a UAV”. “AGC (Ground Control) voice communication system includes a GC voice input device for converting the voice of UAV (Unmanned Aerial Vehicle) pilot into an analog voice signal, AGC electric wave generating/recovery device and a GC transceiver for transceiving the UAV. A UV voice communication system includes a UAV electric wave generating/restoring apparatus, a frequency control device, and a UAV transceiver. An ACC (Air Control Center) voice communication system includes an ACC voice input/output device, an ACC wave generating/restoring device, and ACC device”.
In U.S. Pat. No. 8,311,827, entitled VEHICLE CONTROL, a speech recognition interface and method is described to control a UAV. The system and method includes receiving one or more instructions issued as speech; analyzing the speech using speech recognition software to provide a sequence of words and a word confidence measure for each word so recognized; analyzing the sequence of words to identify a semantic concept corresponding to an instruction based on the analysis, and a semantic confidence level for the identified semantic concept derived at least in part with reference to the word confidence measures of the words associated with the semantic concept; providing a spoken confirmation of the semantic concept so identified based on the semantic confidence level and an indicated verbosity level, the spoken confirmation being provided with one of a speaking rate or a pitch that is increased as the indicated verbosity level decreases; and using the semantic concept so identified to provide a control input for the vehicle. The step of providing the spoken confirmation of the semantic concept comprises indicating that the instruction was not understood when the semantic confidence level is below a threshold, or the step of using the semantic concept comprises providing the control input for the vehicle when the semantic confidence level exceeds the threshold.
In US Patent Publication 2008-0065275, entitled METHOD AND SYSTEM FOR CONTROLLING MANNED AND UNMANNED AIRCRAFT USING SPEECH RECOGNITION TOOLS, a system and method is provided for controlling an aircraft with voice instructions from an air traffic controller, and transmitting a voice response to the air traffic controller. At least one response logic unit is also provided to interpret the received voice instruction from the air traffic controller, determine a response to the interpreted voice instruction, and translate the interpreted voice instruction to a command suitable for input to at least one autopilot unit. The at least one autopilot unit is provided to receive the command from the response logic unit, wherein the command is configured to guide the flight of the unmanned aircraft.
In U.S. Pat. No. 7,174,300, a dialog processing method and apparatus for uninhabited air vehicles is described. The apparatus contains a recognition unit for recognizing incoming data, an interpretation unit for interpreting the data according to a grammar and a response unit for generating an appropriate response to the incoming data. The method may utilize natural language processes and may reduce to a finite state machine. The incoming data is combined with uninhabited air vehicle state information to increase the accuracy of this interpretation. Additionally, the dialog states may be limited to customary air traffic control dialogs.
In US Patent Publication 2008/0201148, a system for dynamically generating a contextual database that is accessed by a speech recognition system which interfaces with a subassembly of a vehicle is described. The system comprises: a situation sensor that generates one or more signals indicative of the situation of the vehicle, the one or more signals including contextual data that are indicative of the position and speed of the vehicle; a spoken name generator that receives the one or more signals from the situation sensor; an electronic flight bag having a first data array, the spoken name generator dynamically accessing, interpreting, analyzing and sorting through the first data array in the electronic flight bag and selecting only the data that are relevant to a pilot with respect to the present position, movement and flight plan for the aircraft; a contextual dynamic grammars database that includes a second data array which is smaller than the first data array; and a speech recognition system that interfaces with the contextual dynamic grammars database and awaits one or more commands from a pilot or other operator of the vehicle before generating and sending one or more activation signals to the subassembly, so that upon receiving the one or more commands, the speech recognition system compares the vocabulary used in the one or more commands with data elements that are stored in the second data array in the contextual dynamic grammars database and when the speech recognition system reliably recognizes the one or more commands received from the pilot or other operator and matches them with data elements contained in the contextual dynamic grammars database, the speech recognition system processes the command by communicating the one or more activation signals to the subassembly
As mentioned above, in other patent publications related to UV voice control, ASR accuracy is improved by using noise cancellation techniques or by incorporating spontaneous speech in the language model which is not dynamically adapted. Examples of such are described in EP publication 2040250 B1 and U.S. Pat. No. 7,774,202 B2. In these examples ASR appears to be employed on a “black box” basis.