1. Field of the Invention
The present invention relates to a speech-recognition system based on speaker-adaptation, and more particularly, to a speech-recognition system in which speaker-adaptation is gradually performed on a terminal of a user and recognition of speech spoken by a speaker is performed on a speech-recognition server using speaker-adapted information.
2. Discussion of Related Art
The conventional speech-recognition systems are each configured of a speech recording program installed in the user's terminal and an online server which recognizes the user's speech using a variety of speech recognition algorithms and memory resources. This configuration has been developed in a direction to increase the performance of speech recognition in consideration of characteristics of the individual user so as to satisfy an environment in which the resources of the online server are more readily used and an environment in which the terminal size is further reduced, and to this end, speaker-adaptation techniques are being used.
The speaker-adaptation techniques are techniques in which a difference between speech spoken by the speaker and an acoustic model is changed and reduced so as to be adapted to a speech characteristic of the speaker, which use a variety of methods such as a maximum a posteriori (MAP) method, a maximum likelihood linear regression (MLLR) method, a maximum a posteriori linear regression (MAPLR) method, an Eigen-voice method, etc.
In order to use the speaker-adaptation techniques in the conventional speech-recognition system using the online server, two factors are needed. The first factor is to provide adaptation data and correct answer transcriptions, and the second factor is to provide a speaker identifier (ID) to identify the speaker.
Specifically, the first factor is often used when a pre-learning process for speaker-adaptation is configured in the speech-recognition system, and in this case, the pre-learning process refers to spoken words or phrases in which the correct answer transcriptions are given in advance by the user and performing the speaker-adaptation. The second factor is often used when a speech-recognition device of the online server is used unlike the case in which the speech-recognition device is configured in the terminal itself and there is no need to identify the speaker. In this case, since the adaptation appropriate for the speaker may be performed after the speaker of the terminal connected to the server is identified, the speaker ID is needed.
However, since the method of performing the pre-learning process or providing the speaker ID is not only cumbersome but also requires a space allocated for each speaker in the server, there exists a problem that can lead to data overload.