In determining whether an individual is or is not a particular pre-identified individual ie a "client", comparison may be made as between pre-determined parameters relating to the pre-determined person and those measured when any individual is presented for verification. Particular parameters which may be used include parameters relating to speech, although parameters relating to other characteristics may be used. Among those other characteristics are parameters relating to how the presenting individual writes, uses a computer mouse, or uses a computer or other keyboard.
One method of identification, or verification, of whether or not an individual presenting for verification is or is not a pre-determined individual makes use of client models representing each of a population of individuals. Characteristics relating to a person presenting for verification are measured and compared with the characteristics for one or more of the total population. If the characteristics for the person presenting for verification match those for a particular one of the population, then the verification system makes a determination that the presenting person is the particular individual for which the characteristics match. A difficulty with systems of this kind is that values for characteristics for any person presenting may differ from reference values for that person which are used by the system. For example, the values for characteristics used by the system would normally comprise stored values measured in a previous test on the individual, the stored value then being compared with those measured when the person presents for verification. However, naturally occurring variations may exist as between those values stored and those which arise when a verification procedure is carried out. In the case of verification on the basis of characteristics relating to utterances of a person, those variations may, for example, comprise phonetic variations, variations due to environmental conditions and intra speaker variations. Thus, a person may utter a vowel in one fashion when the vowel appears in one word, and in a different fashion when it appears in another word. Again, the test conditions under which the original characteristic values were determined may be noise free, but there may be noise present in the environment when the individual presents for verification. Generally, the, it is not surely possible to effect identification simply on the basis of direct equatability of measured characteristics with those stored for the individual in question. Normally, comparison is effected as between characteristic values for more than one of the population, the determination of identity being made on the basis of the "distance" between the characteristics as stored for more than one of the population and those measured at verification. The characteristics which are measured in the verification process may be multi dimensional. For example, it has been found convenient to use cepstral analysis techniques to analyse the speech of a population and the person presenting for verification. Overlapping samples of, say, 30 millisecond may be taken of the amplitude-time wave form recorded during speech. In this case, it is convenient to generate 15 cepstral coefficients and to generate a model representing each member of the population and of the person presenting for verification, the models being 15 dimensional and with, for example, 128 points. The set of such points is commonly referred to as a code book for the person in question.
In the comparison of the code book of the person presenting for verification and those for the reference population employed by the verification technique, one may choose from the code books for the population code books of a "cohort", being a limited number of the population, and then compare the code book of the presenting person with codes books for that cohort. The cohort is selected from the total population on the basis that there is some similarity between the code book for the "client" in the population (ie the person whom the person presenting for verification purports to be) and the relevant cohort members. Selection of the cohort members can be made on the basis of the proximity of the centroids of the code book distributions to the centroid of the client's code book distributions. It is important that the multi-dimensional (Euclidean) distance between the centroid for the client and the various cohort members be significant, but not too great.
While methods based on the above have been found to be workable, hitherto inexplicable errors sometimes arise. For example, an error as basic as failure to discriminate between a male and a female voice may occur. It has now been determined that a likely cause of this difficulty is that the cohorts which are selected for the particular client do not have code book distributions which "surround" the code book distributions for the client in a satisfactory fashion. In particular, if the distance from the centroid of the code book distributions for the person presenting for verification to the client code book distribution centroid is great, then the difference between the distance to the centroids of the code book distributions for the client and for other cohort members will be relatively small. It may easily arise in this case that, because of the distribution of the cohort members with respect to the client, the distance between the code book distribution centroids of the client and of the person presenting for verification is less than the distance from the code book distributions centroid for the person presenting for verification than any of the other cohort members, at least as applies to some particular direction as between the code book distribution centroids for the person presently for verification and for the client and cohorts. Thus, the verification scheme may incorrectly assume that the person presenting for verification is the client in this instance. Merely increasing the number of cohorts will not necessarily rectify this problem.