Speaker recognition technique is very useful for many applications, e.g. speaker tracking, audio index and segmentation. Recently, it has been proposed to model a speaker using several anchor (speaker) models. The speaker voice is projected on the anchor models to constitute vector representing acoustic characteristics of the speaker.
FIG. 1 shows a block diagram of a conventional device for speaker recognition. As shown in the FIG. 1, an anchor space is created by training speeches from a lot of general speakers. In a reference anchor set generation unit 102, a number of virtual anchor speakers which are the centriods of clusters are selected from the anchor space to form a reference anchor set, or the nearest anchor speaker to the centroid of each cluster is selected to form the reference anchor set. A front end 101 receives an enrollment speech by a target speaker and converts the enrollment speech into feature parameters, and sends the feature parameters to a voice print generation unit 103. The voice print generation unit 103 generates a voice print based on the feature parameters sent from the front end 101 and the reference anchor set generated by the reference anchor set generation unit 102. Then, the generated voice print is stored into a voice print database 104 for further use for speaker recognition. As can be seen from the FIG. 1 the reference anchor set generated by the device 100 can reflect the distribution of the anchor space itself only. Accordingly, a larger number of anchors are needed to describe the target speaker better, which makes the computation load higher and more difficult to be used in the embedded system.