Speaker recognition technology is useful in Human-Machine Interaction. Many applications and products could be enabled or augmented with speaker recognition technology, such as (1) on-site access control to some facilities (home appliances, car, PC terminals etc., or (2) remote secured access to database, website, even bank transaction through the lines of telephone, mobile or computer.
An enrollment or registration process for a target speaker is necessary before utilizing speaker recognition technology in a real system. In the speaker enrollment process, a sample speech from a target speaker is collected and used to generate the statistic template of a specific speaker. The quality of generated statistic template has big influence on the performance of the speaker recognition system.
FIG. 1 shows a diagram of a conventional device for pass-phrase modeling for speaker verification system. When a user wants to register his or her pass-phrase during the enrollment process, the utterances of the pass-phrase from the target user is requested by a front end 101 of the speaker verification system. Since the user's utterances are not always exactly the same each time, 3-5 repetitions of the pass-phrase are necessary in order to get a robust statistic template model in a modeling unit 103. The created template model is stored into a database 105 for the later verification. Two main disadvantages of the conventional method are: (1) if less enrollment data available or big intra-speaker variations existing, the effect of enrollment is not assured; and (2) user experience is not so good if more repetitions are needed since the users prefer a simple enrollment procedure.