The 1990's decade has been marked by a technological revolution driven by the convergence of the data processing industry with the consumer electronics industry. This advance has been even further accelerated by the extensive consumer and business involvement in the Internet over the past few years. As a result of these changes it seems as if virtually all aspects of human endeavor in the industrialized world require human/computer interfaces. There is a need to make computer directed activities accessible to people who up to a few years ago were computer illiterate or, at best, computer indifferent.
Thus, there is continuing demand for interfaces to computers and networks which improve the ease of use for the interactive user to access functions and data from the computer. With desktop-like interfaces including windows and icons, as well as three-dimensional virtual reality simulating interfaces, the computer industry has been working hard to fulfill such user interaction by making interfaces more user friendly by making the human/computer interfaces closer and closer to real world interfaces, e.g. human/human interfaces. In such an environment, it would be expected that speaking to the computer in natural language would be a very natural way of interfacing with the computer for even novice users. Despite these potential advantages of speech recognition computer interfaces, this technology has been relatively slow in gaining extensive user acceptance.
Speech recognition technology has been available for over twenty years, but it has only recently begun to find commercial acceptance, particularly with speech dictation or “speech to text” systems, such as those marketed by International Business Machines Corporation (IBM) and Dragon Systems. That aspect of the technology is now expected to have accelerated development until it will have a substantial niche in the word processing market. On the other hand, a more universal application of speech recognition input to computers, which is still behind expectations in user acceptance, is in command and control technology, wherein, for example, a user may navigate through a computer system's graphical user interface (GUI) by the user speaking the commands which are customarily found in the systems' menu text, icons, labels, buttons, etc.
Many of the deficiencies in speech recognition both in word processing and in command technologies are due to inherent voice recognition errors due in part to the status of the technology and in part to the variability of user speech patterns and the user's ability to remember the specific commands necessary to initiate actions. As a result, most current voice recognition systems provide some form of visual feedback which permits the user to confirm that the computer understands his speech utterances. In word processing, such visual feedback is inherent in this process, since the purpose of the process is to translate from the spoken to the visual. That may be one of the reasons that the word processing applications of speech recognition has progressed at a faster pace.
However, in speech recognition driven command and control systems, the constant need for switching back and forth from a natural speech input mode of operation, when the user is requesting help or making other queries, to the command mode of operation, when the user is issuing actual commands, tends to be very tiresome and impacts user productivity, particularly when there is an intermediate display feedback.