1. Field of the Invention
The present invention relates to a method and system of identifying a voice pattern, and, in particular, to a technique for identifying a voice pattern of a time and frequency representation with compression information.
2. Description of the Prior Art
For the recognition of voice such as spoken words and messages by a device, it is well known to convert the voice into a pattern of time and frequency representation. Such a voice pattern is typically represented in the coordinate system having the abscissa taken for time and the ordinate taken for frequency. Such voice patterns for a predetermined number of words or messages are previously stored in a library, and when a word is spoken, it is processed into a voice pattern of time and frequency representation for comparison with those stored voice patterns one after another thereby identifying the spoken word.
In such a case, even if the same word is spoken by the same speaker, its time span varies nonlinearly each time when spoken. Under the circumstances, in order to carry out the identification of voice pattern by absorbing such temporal fluctuations, there is proposed a method to utilize dynamic programming, or there is proposed another method to utilize the properties of membership functions in fuzzy logic, wherein margins are provided in the pattern. In particular, in accordance with the latter method, use is made of appropriate means so as to make two patterns to be compared to be equal in time length, and then the two patterns are superimposed one on top of another, whereby the degree of similarity is determined depending on how well the two patterns match.
It is generally desired to store as large number of voice patterns as possible in the library, and in order to attain this objective, the voice pattern is often stored as compressed. Such a voice pattern, which is compressed to remove redundancy, is typically shown in FIG. 1, in which the horizontal axis is taken for frequency and the vertical axis is taken for time. As shown in FIG. 4, the frequency is divided into a plurality of bands F.sub.1, F.sub.2, . . . , F.sub.n and the binary distribution of voice pattern during a selected time interval may be determined by comparing each frequency component with a predetermined threshold. Such a set of binary data for a selected time interval is arranged horizontally and such a set is called frame. In FIG. 1, only the first two frames of a voice pattern are shown. Furthermore, in the example illustrated in FIG. 1, the voice pattern is determined by sampling the same word spoken by the same speaker three times and adding these three sampled data together. Thus, the digit "3" in the pattern indicates the highest frequency of occurrence and "0" indicates the lowest frequency of occurrence.
The voice pattern shown in FIG. 1 also includes a compression data for each frame and such compression data is indicated by A. The compression data indicates how may times the corresponding frame should be repeated as time goes on. That is, in the illustrated example, the compression data of the first frame is "1" so that this frame should be used only once, but since the compression data of the second frame is "3" so that this frame must be used three times in succession or repeated twice. Such a compression scheme is quite advantageous because the next two frames, i.e., third and fourth frames, are not actually stored, but they are effectively stored in the compression data for the second frame, thereby allowing to cut down the data storage area.
In this manner, since the voice patterns are normally compressed when stored, when such stored voice patterns are to be used for comparison with a sampled voice pattern for mechanical recognition of spoken word, the compressed data must be expanded to original uncompressed state for comparison with the uncompressed sample data. This is highly disadvantageous, because the stored data must be compared with the input data one after another and each of the stored data must be subjected to expanding processing when compared with the input data.