This disclosure relates to a speaker recognition system. One aim of a speaker recognition system may be to verify that a speaker is who they claim to be, that is, to perform speaker verification. Such a system requires a user to enrol into the system by providing a speech sample, which is used to create a model of the user's speech. Subsequent attempts by a speaker to access the system are then compared with the model created during enrolment.
In a speaker recognition system, the process of calibration aims to ensure that the system operates at a desired operating point across all environmental conditions (for example, across all noise measures and across all channels). The desired operating point can be defined as the False Acceptance Rate (FAR) of the system. The FAR can then be adjusted depending on the security level required by the speaker recognition system at the time.
In situations where the environmental conditions are not ideal, it is desirable that this is only reflected in an increase of the False Rejection Rate (FRR) of the system, without a change in the FAR (which may seriously impact the security of the speaker recognition system).
The presence of background noise can result in calibration problems in a speaker recognition system, such that the speaker recognition system does not obtain the desired FAR.
An attempt to solve this problem is the use of test normalization, or T-Normalization. The speech of a speaker attempting to access the system is compared with the model obtained during enrolment, and is also compared with the models of a cohort of other speakers. The statistics of the scores obtained from the comparisons with the cohort are used to modify the score obtained from the comparison with the model obtained during enrolment, and the modified score is used as the basis for the decision as to whether the speaker is the enrolled user. Theoretically, T-Normalization guarantees that the FAR will remain stable, regardless of the environmental conditions when a speaker accesses the speaker recognition system. However, this will only occur when the environmental conditions of the cohort, and the environmental conditions of the enrolment, are identical. Thus, this stabilisation of the FAR is often not possible, as it is not possible to predict the environmental conditions that will be present when a speaker enrols into the speaker verification system.