This invention relates to a speaker recognizing or discriminating system, which may be any one of a speaker verifying, a speaker identifying, and a speaker classifying system. More particularly, this invention relates to a speaker recognizing system in which pattern matching is carried out by resorting to a dynamic programming algorithm.
An article was contributed by Aaron E. Rosenberg to Proceedings of the IEEE, Vol. 64, pages 475-487 (April 1976), and entitled "Automatic Speaker Verification: A Review." In the article, various speaker verification systems are reviewed. An electronic digital computer is used in a Texas Instrument entry control system. A dynamic programming technique is resorted to in a Bell Labs automatic speaker vertification system on establishing a warping function for use in carrying out time registration between an input speech pattern and a reference speech pattern. The time normalization, also called time normalization or alignment in the art, is carried out by using speech or phonetic events, such as an intensity contour, in each of the input and the reference speech patterns. Besides notes on speaker identification, various fields of application are described as, for example, banking and credit authorizations, entry controls, and transactions from remote locations. The article furthermore shows a number of reference articles.
On the other hand, specific speech recognition systems for automatically recognizing continuously spoken words are revealed in U.S. Pat. No. 3,816,722 issued to the present applicant et al and assigned to the present assignee, and Nos. 4,059,725 and 4,049,913, both issued to the present applicant and assigned also to the instant assignee. Papers were contributed by the applicants, either jointly or singly, to IEEE Transactions and others as regards such speech recognition systems. The fact that none of the papers is listed in the bibliography of the Rosenberg article, would prove it insurmountably difficult to apply the speech recognition systems to speaker recognition.
In each of the speech recognition systems disclosed in the patents, an input speech sound or pattern is converted to a time sequence of feature vectors representative of the input speech sound. A plurality of feature vector sequences are preliminarily stored in the system to represent reference speech sounds. Each feature vector sequence corresponds to a set of filter bank output samples described in the Rosenberg article. According to the patents, the feature vector sequence is dealt with as it stands, rather than after being subjected to segmentation, as called by Rosenberg, prior to analysis. More specifically, similarity measures are calculated between the input feature vector sequence and the reference feature vector sequences according to the dynamic programming technique. In other words, pattern matching is carried out between an input speech pattern and reference speech patterns by resorting to a dynamic programming algorithm. The time normalization is simultaneously carried out without utilizing the speech events of the type described in the Rosenberg article. Inasmuch as the speech recognition systems are already in practical use and have proven to be excellently operable, it is desirable to develop a speaker recognizing system without much modifying the speech recognition system.