Due to the difficulties inherent in the analysis of complex processes such as speech and handwriting, the machine recognition of these two methods of communication has thus far met with limited success.
The separate development of speech and handwriting recognition methodology has resulted in two alternative means to improve the effectiveness of a man-machine interface, particularly in the context of workstations and personal computers. These separately developed approaches both attempt to utilize natural ways of communicating, specifically voice and handwriting, in order to establish an unconstrained dialogue with a computer.
One rational for developing Automatic Speech Recognition (ASR) is to provide a dictation machine capable of reducing or eliminating the need for a keyboard. More generally, ASR can be defined as any system that uses as input a set of patterns produced by a speaker. ASR may also be extended to include systems that recognize speech based on the measurement of acoustic vibrations in the throat and chest, as well as systems which recognize the movements of lips, jaws, etc. In this regard reference is made to an article entitled "Automatic Lipreading to Enhance Speech Recognition", CH2145-1, pps. 40-47, IEEE (1985), by E. D. Petajan.
Similarly, one rational for developing Automatic Handwriting Recognition (AHR) has also been directed towards reducing or eliminating the requirement for the conventional computer keyboard. Traditionally, AHR recognizes handwritten characters captured in real time on an electronic tablet.
By example, in human factor studies there are often conducted large numbers of psychological experiments to measure a user's reaction to some parameters of interest. The majority of such experiments involve many participants, each of whom must register their responses for later analysis. This analysis is usually accomplished automatically through special-purpose software. It is of paramount importance that the subject's reaction to the measured parameters be isolated as completely as possible from their reaction to the logistics of the experiments. In particular, experiments involving moderately unnatural interfaces, such as a keyboard, may conceivably distract the attention of the subject away from the essence of the experiments. Also, the modus operandi for some experiments may not leave the subject sufficient time to type a response.
In this situation, the only reasonable alternatives are to dictate the responses into a tape recorder or to write them by hand on an answer sheet. Unfortunately, the use of such natural interfaces tends to complicate the interpretation of the experiments. Converting the subject's responses into a form suitable for automatic analysis requires subsequent effort by clerical personnel or by the participants themselves. All dictated/handwritten material must be carefully typed and then thoroughly reviewed for typing errors, lest the results be biased in some unpredictable way. The required strenuous proof-reading requires a substantial amount of time and causes large delays in interpreting experiments.
One method to avoid these delays without sacrificing the naturalness of the interface is to have the participants directly produce soft copies of their responses, either by dictating into an ASR or by providing handwriting to an AHR. Both of these approaches, however, suffer from the deficiencies enumerated below.
One disadvantage inherent in conventional ASR and AHR techniques is that an acceptable recognition rate is often achieved only in a user-dependent mode with isolated (word or character) inputs. Operating in this way results in several undesirable constraints, including: (a) a user must train the system before being able to use it; and (b) a user must interact with the system in a slightly unnatural way, either by speaking with pauses between words or by writing with spaces between letters.
Another disadvantage inherent in conventional ASR and AHR techniques is that 100% recognition accuracy is extremely difficult to achieve, due to an inability of a human to be perfectly consistent over time in the manner in which he or she speaks or writes.
This disadvantage also imposes several constraints including: (a) a user must identify any errors that may be introduced during decoding, which tends to distract the user's attention away from the subject of the message; and (b) a user must correct wrongly decoded words or characters which, as a matter of practicality, requires the use of an alternative technology, such as a keyboard.
The following U.S. Patents are cited as being of interest to the teaching of the invention.
Commonly assigned U.S. Pat. No. 3,969,700, issued Jul. 13, 1976, entitled "Regional Context Maximum Likelihood Error Correction for OCR, Keyboard, and the Like", to Bollinger et al., describes a general error correction scheme, applicable to pattern recognition problems, that use segmentation prior to recognition, including optical character recognition (OCR) and ASR. No specific mention is made of on-line AHR and there is no disclosure of employing both ASR and AHR in a message recognition system.
U.S. Pat. No. 4,651,289, issued Mar. 17, 1987, entitled "Pattern Recognition Apparatus and Method for Making Same", to Maeda et al., describes an algorithm whose primary goal is to introduce robustness in a template matching procedure, such as is used in ASR and AHR. To this end there are employed multiple templates per pattern to be recognized, rather than a single template. Scores from multiple matches are appropriately combined to arrive at a final decision. There is no disclosure of applying this method simultaneously to speech and handwriting.
U.S. Pat. No. 4,736,447, issued Apr. 5, 1988, entitled "Video Computer", to Korsinsky describes a multi-facet interface which is said to acquire a variety of data, including speech and handwriting. The basic recognition algorithm utilized is template matching. No attempt is made to integrate the diverse sources of information, either for recognition purposes or otherwise. Also, there is no mention of error-correction.
U.S. Pat. No. 4,774,677, issued Sep. 27, 1988, entitled "Self-Organizing Circuits", to Buckley, describes a class of algorithms and architectures pertinent to the general field of pattern recognition. However, as in the previous references, the issue of the integration of ASR and AHR is not addressed.
U.S. Pat. No. 4,907,274, issued Mar. 6, 1990, entitled "Intelligent Work Station", to Nomura et al., describes a telephone interface incorporating speech recognition and synthesis. The application involves image compression as well, but there is no discussion of AHR or OCR.
Other art of interest includes the following.
European Patent Application 0 355 748 describes a fast classification algorithm which is intended to be used in conjunction with a hierarchical approach to pattern recognition. The domain of applications appears to be OCR. There is no explicit mention of speech or of an integration of ASR and AHR.
IBM Technical Disclosure Bulletin Vol. 22, No. 5, October 1979, entitled "Handwriting/Speech Signal Analyzer" by J. Gould et al. describes an analyzer to discriminate between pause times and production times in on-line AHR or ASR.
An article entitled "Experiments on Neural Net Recognition of Spoken and Written Text" IEEE Transaction on Acoustics, Speech and Signal Processing, Vol. 36, No. 7, July 1988 by D. Burr compares two different classes of algorithms for use in pattern recognition problems such as AHR and ASR. The combined use of speech and handwriting is not considered, and error correction is not addressed.
It is thus one object of the invention to provide a message recognition system that employs a functional complementarity of speech and handwriting inputs.
It is another object of the invention to provide method and apparatus for sequentially using a handwriting or a speech input in an error correcting technique for the other.
It is another object of the invention to provide method and apparatus that integrates AHR and ASR in a message recognition system to exploit the complementarity of speech and handwriting inputs as simultaneous sources of information.
It is another object of the invention to provide method and apparatus that integrates AHR and ASR in a message recognition system to achieve a combined use of speech and handwriting recognition that significantly improves the overall accuracy of an automatic message recognizer.
It is one further object of the invention to provide method and apparatus that integrates AHR and ASR in a message recognition system in (a) a sequential manner to achieve error correction, (b) in a simultaneous manner using separate ASR and AHR components, and (c) in an simultaneous manner using a single ASR/AHR component.