1. Field of the Invention
The present invention relates to a speaker verification apparatus and method for determining by the voice of the speaker whether or not the speaker is an authorized user based on the feature parameters of the voices that are previously registered.
2. Description of the Prior Art
In recent years, with the development of computer technologies, a communication environment has been developed rapidly. With the development of such a communication environment, computer telephony integration through the telephone has become common in ordinary homes.
In the field of such computer telephony integration through the telephone, a problem may arise when accessing information that should not be known to people other than the authorized person or a specific group of authorized people, such as private information or information subjected to secrecy obligation. More specifically, for example, when a push-button telephone is used, it is possible to acquire an access authority to information by inputting a password by an operation of pushing buttons of the telephone. However, when the password is known to unauthorized people, they can access the information easily although they are not duly authorized. For this reason, there is a need of verifying whether or not the person who tries to access the information is the duly authorized person or one of a specific group of authorized people using the voice, which is inherent to the individual. In order to ensure such a security function, it is important that the registration of voices for verification or the determination of the threshold for judging whether or not the input voice is the voice of an authorized person does not cause an excessive burden to the user.
Conventionally in general, a fixed and predetermined value has been used as a threshold for determining whether or not the speaker is an authorized person. More specifically, as shown in FIG. 1, a verification distance between an input voice and a previously registered voice is calculated and compared to a predetermined threshold. When the verification distance is equal to or shorter than the predetermined threshold (“−” in FIG. 1), it is determined that the speaker is an authorized person. When the verification distance is longer than the predetermined threshold (“+” in FIG. 1), it is determined that the speaker is an unauthorized person.
It is desirable that such a threshold is set to a value as described below. In FIG. 2, FR (false rejection error rate), which is a probability of the case where the determination that the speaker should be rejected as an unauthorized person is erroneous, is plotted in the vertical axis against the threshold of the verification distance in the horizontal axis. Similarly, FA (false acceptance error rate), which is a probability of the case where an unauthorized person is erroneously accepted, is plotted in the vertical axis against the threshold of the verification distance in the horizontal axis. When the threshold is a small value, the rate FA of erroneous acceptance of an unauthorized person is low, whereas the rate FR of erroneous rejection of an authorized person is high. On the other hand, when the threshold is a large value, the rate FR of erroneous rejection of an authorized person is low, whereas the rate FA of erroneous acceptance of an unauthorized person is high. Therefore, it is desirable to set the threshold to be an appropriate value depending on the level of importance of the two error rates. It is general to perform verification using a value that allows the two error rates to be eventually equal experimentally as the threshold.
However, in the above-described method, it is necessary to be aware of the tendency of the false rejection error rate FR and the false acceptance error rate FA beforehand to set the threshold. However, it is difficult to know the two error rates before being used. Therefore, a preliminary experiment is performed to seek an approximate value, or the threshold is updated whenever it is required at the time of using the system. The method of performing a preliminary experiment is disadvantageous for the following reasons. Because of the difference in the conditions between when the preliminary experiment is performed and when the system is actually used, it is often necessary to perform a test again when using the system. In addition, in order to obtain the false rejection error rate FR, it is necessary for an authorized person (user) to give his/her voice many times, which causes a large burden to the user and is unpractical. On the other hand, the method of updating the threshold whenever it is required at the time of using the system is disadvantageous because updating the threshold causes a large burden to the user as well.
Furthermore, the voice of an authorized person can change over time, and in general, accurate identification of the speaker is difficult when noise such as background sound is mixed therewith.