There is a present demand for automatic speaker identification and automatic word recognization systems. The market for speaker identification systems includes security systems, credit sales operations, computer data access, banking activities and law enforcement. Recent studies have confirmed that voice print identification is sufficiently reliable to be used as legal evidence. The advantage of voice print over other techniques such as fingerprint identification is that existing telephone lines can be used to transfer the information through the use of an inexpensive microphone and without the need for expensive conversion equipment at the terminal location.
The market for word recognition systems includes, material handling operations, mail sorting, manufacturing control, automatic checkout for supermarkets, and voice actuated switches. The use of spoken data in the areas of material handling, mail sorting, manufacturing control, and automatic checkout supermarkets frees the hands of the operator to perform other tasks. The potential use of speech recognition for computer data input is an exciting one in that it would permit the use of natural languages for programming the computer thus eliminating the need for developing peculiar language for the computer. This would permit computer input data to be provided by individuals who have no knowledge of the operation of the computer.
A speech recognition system must perform three basic functions:
1. Extract characteristic features of the speech wave, in order to reduce the very large information content of the speech wave to basic information, sufficient for identification of the speaker and/or recognition of the linguistic content.
2. Perform some type of time axis normalization, i.e., to contract or expand a basic linguistic element known as a phonene to a standardized duration, so that the word can be matched with stored information so that it can be recognized independently of how fast the word is spoken or whether parts of it have been stressed.
3. Compare the normalized words with a set of stored words, and indicate the best match.
In present speech recognition systems characteristic features are extracted via a Fourier analaysis or a time series analysis of the speech wave. A subsequent algorithm usually performs phoneme segmentation and time axis normalization. For real time operation, such systems require extensive calculation power generally required to be provided by a full size computer in addition to preprocessing equipment such as filter banks or sampling and timing devices. For most of the potential applications, the cost of such systems is beyond an economically acceptable level.