Recognition of a speaker identity is a significant identity recognition means. A user speaks a segment of voice, and a terminal acquires the segment of voice, performs a series of operations, such as preprocessing, feature extracting, modeling, and parameter estimating, on the acquired voice, and then maps the voice into a vector having a determined length and capable of expressing a voice feature of the speaker. The vector is referred to as an identity vector. The identity vector may well express identity information of the speaker in the corresponding voice. The identity vector of the speaker is compared with an identity vector of a target user, and whether the speaker is the target user may be determined according to a degree of similarity between the identity vector of the speaker and the identity vector of the target user, so as to implement speaker identity verification.
However, the identity vector is apt to be interfered with by channel variability and environment variability, and as a result accuracy of identity recognition of the speaker is influenced. The channel variability refers to distortion caused on the voice by difference in acquiring terminals and/or difference in transmission. The difference in acquiring terminals is, for example, difference in terminal types such as a mobile phone and a tablet computer, and the difference in transmission is, for example, difference in transmission channels such as using wired transmission or wireless transmission. The environment variability refers to distortion caused on the voice by a factor of environment where the speaker is. The factor of environment is, for example, indoor or outdoor, or environmental noises.