The 1990's decade has been marked by a technological revolution driven by the convergence of the data processing industry with the consumer electronics industry. This advance has been even further accelerated by the extensive consumer and business involvement in the Internet over the past few years. As a result of these changes it seems as if virtually all aspects of human endeavor in the industrialized world require human/computer interfaces. There is a need to make computer directed activities accessible to people who, up to a few years ago, were computer illiterate or, at best, computer indifferent.
Thus, there is continuing demand for interfaces to computers and networks which improve the ease of use for the interactive user to access functions and data from the computer. With desktop-like interfaces including windows and icons, as well as three-dimensional virtual reality simulating interfaces, the computer industry has been working hard to fulfill such interface needs by making interfaces more user friendly by making the human/computer interfaces closer and closer to real world interfaces, e.g. human/human interfaces. In such an environment it would be expected that speaking to the computer in natural language would be a very natural way of interfacing with the computer for even novice users. Despite the potential advantages of speech recognition computer interfaces, this technology has been relatively slow in gaining extensive user acceptance.
Speech recognition technology has been available for over twenty years but it has only been recently that it is beginning to find commercial acceptance, particularly with speech dictation or “speech to text” systems, such as those marketed by International Business Machines Corporation (IBM) and Kurzweil Corporation. That aspect of the technology is now expected to have accelerated development until it will have a substantial niche in the word processing market. On the other hand, a more universal application of speech recognition input to computers, which is still behind expectations in user acceptance, is in command and control technology wherein, for example, a user may navigate through a computer system's graphical user interface (GUI) by the user speaking the commands which are customarily found in the systems menu text, icons, labels, buttons, etc.
Many of the deficiencies in speech recognition, both in word processing and in command technologies, are due to inherent voice recognition errors due in part to the status of the technology and in part to the variability of user speech patterns and the user's ability to remember the specific commands necessary to initiate actions. As a result, most current voice recognition systems provide some form of visual feedback which permits the user to confirm that the computer understands his speech utterances. In word processing, such visual feedback is inherent in this process since the purpose of the process is to translate from the spoken to the visual. That may be one of the reasons that the word processing applications of speech recognition have progressed at a faster pace. In any event, in all voice recognition systems with visual feedback, at some stage, the interactive user is required to make some manual input, e.g. through a mouse or a keyboard. The need for such manual operations still gets in the way of interactive users who, because of a lack of computer skills or other reasons, wish to relate to the computer system in a fully voice activated or conversational manner.