1. Field of the Invention
This invention relates generally to electronic speech recognition systems and relates more particularly to a method for implementing a speech recognition system for use during conditions with background noise.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Human speech recognition is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech typically consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence. In practice, speech recognition systems typically determine the endpoints (the beginning and ending points) of a spoken utterance to accurately identify the specific sound data intended for analysis. Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech recognition system. Examples of such conditions may include speech recognition in automobiles or in certain manufacturing facilities. In such user applications, in order to accurately analyze a particular utterance, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to FIG. 1, a diagram of speech energy 110 from an exemplary spoken utterance is shown. In FIG. 1, speech energy 110 is shown with time values displayed on the horizontal axis and with speech energy values displayed on the vertical axis. Speech energy 110 is shown as a data sample which begins at time 116 and which ends at time 118. Furthermore, the particular spoken utterance represented in FIG. 1 includes a beginning point t.sub.s which is shown at time 112 and also includes an ending point t.sub.e which is shown at time 114.
In many speech detection systems, the system user must identify a spoken utterance by manually indicating the beginning and ending points with a user input device, such as a push button or a momentary switch. This "push-to-talk" system presents serious disadvantages in applications where the system user is otherwise occupied, such as while operating an automobile in congested traffic conditions. A system that automatically identifies the beginning and ending points of a spoken utterance thus provides a more effective and efficient method of implementing speech recognition in many user applications.
Some speech-recognition systems determine the beginning and ending points of a spoken utterance by using non-real time analysis techniques. For example, a speech-recognition system may first capture all the speech energy 110 corresponding to a particular utterance starting at time 116 and ending at time 118. Then, the non-real time system may subsequently process the captured speech energy 110 to determine beginning point t.sub.s at time 112 and ending point t.sub.e at time 114. The non-real time system thus delays the calculation of the beginning and ending points until the entire utterance is captured and processed. In contrast, a system which continually recalculates and updates beginning and ending points in real-time as speech energy 110 is being acquired may provide a more responsive and flexible method for implementing a speech recognition system.
Speech recognition systems use many different speech parameters, including amplitude, short-term auto-correlation coefficients, zero-crossing rates, linear prediction error and harmonic analysis. In spite of attempts to select speech parameters that effectively and accurately allow the detection of human speech, robust speech detection under conditions of significant background noise remains a challenging problem. A system that selects and utilizes effective speech parameters to perform robust speech detection in conditions with background noise may thus provide a more useful and powerful method of speech recognition. Therefore, for all the foregoing reasons, an improved method is needed for implementing a speech recognition system for use during conditions with background noise.