This invention relates to a method and apparatus for automatic continuous-speech recognition, and more particularly, to a speaker-dependent continuous-speech recognition (CSR) system in which the recognition function is performed on a Personal Computer Memory Card International Association (PCMCIA) card. This invention is particularly well-suited for use in military or other such operations which require light-weight portable computers.
People have been interested in analyzing and simulating the human voice since ancient times. Recently, interest in speech processing has grown with modern communication systems. Improved methods of speech processing can provide us with more sophisticated means for recording, securing, transporting, and imparting information. To illustrate, modern communications systems could include talking computers, information systems secured with a voiceprint password, and word processors or other typing machines that receive data through voice input (as opposed to manual or keyed input). Operators of telephones, computers, word processors, etc., could call others, enter data, request information, or control systems when their hands and eyes are busy or unavailable, and such systems could be used by sight-impaired persons. Voiceprints can be used to control access to protected services, such as phone cards, voice mail, customer accounts, and cellular service. See, for example, U.S. Pat. No. 5,414,755, issued to Lawrence G. Bahler on May 9, 1995, et al., entitled SYSTEM AND METHOD FOR PASSIVE VOICE VERIFICATION IN A TELEPHONE NETWORK, and assigned to ITT Corporation, the assignee herein, which discloses the use of speaker verification in connection with long-distance telephone services.
Automated speech processing may take many forms. For example, speech can be input into a device to (1) confirm or verify the identity of the speaker (known as speaker verification); (2) identify an unknown talker (speaker recognition); (3) convert what the speaker says into a graphical representation (speech recognition); (4) translate the substance of what the speaker has said, or have the device respond to the speaker""s utterances (speech understanding); or (5) determine whether specific subjects were discussed or words uttered (word spotting). The method of this invention is particularly useful with regard to the speech recognition methods; the system of this invention enables an individual to train his or her voice on a specified continuous-speech recognition (CSR) task and, once trained, perform automatic speech recognition with that individual""s voice. However, it is contemplated that this invention could be applied to other methods of speech processing as well. Voice recognition systems and methods have been developed, described in the prior art, and are well known. To illustrate, voice recognition systems have been developed by FIT Corporation, the assignee herein, and are disclosed in the following patents, each of which is hereby incorporated by reference: U.S. Pat. No. 4,811,399 issued to Landell, et al., on Mar. 7, 1989, entitled APPARATUS AND METHOD FOR AUTOMATIC SPEECH RECOGNITION; U.S. Pat. No. 4,933,973, issued to Porter on Jun. 12, 1990, entitled APPARATUS AND METHODS FOR THE SELECTIVE ADDITION OF NOISE TO TEMPLATES EMPLOYED IN AUTOMATIC SPEECH RECOGNITION SYSTEMS; U.S. Pat. No. 4,994,983 issued to Landell, et als., on Feb. 19, 1991, entitled AUTOMATIC SPEECH RECOGNITION SYSTEM USING SEED TEMPLATES; and U.S. Pat. No. 5,073,939, issued to Vensko, et als., on Dec. 17, 1991, entitled DYNAMIC TIME WARPING APPARATUS FOR USE IN SPEECH RECOGNITION SYSTEMS. Additionally, for further background regarding speech verification systems, see. for example, U.S. Pat. No. 5,339,385, issued to Higgins on Aug. 16, 1994, entitled SPEAKER VERIFIER USING NEAREST-NEIGHBOR DISTANCE MEASURE; U.S. Pat. No. 4,720,863 issued to Li on Jan. 19, 1988, entitled METHOD AND APPARATUS FOR TEXT-INDEPENDENT SPEAKER RECOGNITION; and U.S. Pat. No. 4,837,830, issued to Wrench, Jr. on Jun. 6, 1989, entitled MULTIPLE PARAMETER SPEAKER RECOGNITION SYSTEM AND METHODS. Each of these patents also have been assigned to ITT Corporation, the assignee herein, and are hereby incorporated by reference.
Generally, in speaker verification systems, xe2x80x9creference patternsxe2x80x9d or Models of the speech phenomenon to be recognized are developed. The Models are then compared to subsequently obtained speech samples (i.e., unknown speech). The Models used for comparison are typically developed by measuring the waveforms of the spoken text and converting the analog waveform of the speaker""s voice to digital form. In comparing the Models developed from the enrollment speech data (obtained from the known speaker), with Models developed from the test data (obtained from an unknown speaker), computer processing is applied to measure the Euclidean distance between the reference Models and the speech samples, which may be determined by applying a linear discriminant function (LDF), and/or by applying speech-recognition algorithms, which are well known and described in the literature and prior art patents cited herein. If the distance measured is less than a predetermined value, or in other words produces a certain xe2x80x9cscorexe2x80x9d a decision is made to accept or reject the test data as having been uttered by the same person providing the enrollment data.
The present invention relates to systems used for computer processing of the speech data. For example, one such voice recognition system produced by ITT Corporation is the ITT VRS 1290 product. The processing elements for the VRS 1290 product are shown in FIG. 1. This system consists essentially of two components: a special-purpose board (in VME, PC, or multi-bus configurations), and software for training and recognition. As shown schematically in FIG. 1, the processing components of the VRS board further consist of a special purpose Dynamic Time Warping (DTW) chip, digital signal processor (DSP), central processing unit (CPU), and Codec. Incoming speech is digitized by the Codec (A), and then converted to filter-bank parameters in the dedicated DSP (B) . The CPU (C) controls the overall operation of the system. Time-aligned match scoring between incoming speech and stored word templates is performed in the DTW (D). On-board memory is provided for storage of word templates, syntax structures, and required scratch buffers. The VRS 1290 software includes the dedicated firmware on the VRS board, and it also includes the host programs for generating syntax files, controlling the training procedures, and interpreting recognizer results for tasks to be controlled by speech recognition.
All aspects of this VRS system, including the DTW apparatus, algorithms for use in training and recognition, and general background regarding the methods involved in voice recognition, are disclosed and described in the ITT patents which were cited above and incorporated herein by reference. This invention is functionally equivalent to CSR systems which use the ITT VRS 1290 product, described above.
One objective of this invention is to implement the functional capabilities of voice recognition systems, such as the special purpose VRS 1290 hardware, with a relatively inexpensive and commercially available device, resulting in a cost advantage. Another objective of this invention is to provide a more practical voice recognition system that may be implemented with small, lightweight computers, such as, for example, the portable computers that are likely to be fielded in military operations.
The invention comprise s a speaker-dependent CSR system in which the recognition function is performed on a PCMCIA card. Components of the PCMCIA card include a DSP, Codec, and memory. All of the processing performed by the CPU, DSP, and special DTW chip in prior art system are performed in the single DSP on the PCMCIA card. Recognition results computed on the card are uploaded to the host computer application (for command interpretation and processing), via the PCMCIA interface. The invention may be operated with system hardware design and a set of computer programs for both the PCMCIA card and the host computer, further described below.