1. Field of the Invention
This invention generally relates to a method, apparatus and related program product for selecting a target voice among multiple voices having different directions and more particularly relates to a technique of voice selection by using a CSP (cross-power spectrum phase) coefficient.
2. Description of Background
In recent years, fraud has become of major concern especially when used in a commercial environment. Substantial efforts are made to protect the consumer and maintain client's trust through providing secure settings and ensuring ethical transactions between the consumer client and the representatives of the entities. Accordingly, establishment of a so-called compliance structure has become an important and an urgent issue. As part of an effort to improve compliance, in some industries such as those providing financial services, for example, any conversation between a clerk representing the institution and a customer at the counter or other such commercial settings is recorded. This is to check through a number of methods (such as using indexing with automatic speech recognition) the clerk's manner of conducting business operations.
Various methods are used in collecting records of conversations such as between a clerk and a customer at the counter or other such business settings. One such method may involve recording of a conversation with a close-talking microphone that is disposed on or nearby the clerk. This method is aimed to record the voice of the clerk only, but ends up also recording the voice of the customer as well which makes at times makes many customers reluctant to do business in view of their conversation being recorded. Accordingly, many businesses are hesitant to use this technology as they do not feel that it is an appropriate manner of keeping customer satisfaction.
In another effort, the voice of a clerk can be collected with a uni-directional microphone disposed at a location undetectable to the customer. Unfortunately, even this method has its flaws as a standard microphone has low directivity and thus ends up recording the voice of a customer as well which may provide both legal and other issues of its own. Using a super-directive microphone, such as a gun microphone, is also not practical as it is cost prohibitive and large.
A different but related problem also exists that affect other settings. In certain situations, a voice from an unexpected source and direction causes problems and therefore needs to be efficiently removed. Such an instance may occur, for example, when using the automatic speech recognition while operating a car navigation system on a vehicle. In this situation, with the driver struggles to remain the target speaker, voices of one or more passengers in the car may interfere with the operation of the system and therefore has to be efficiently removed.
Prior art techniques perform gain control of the speech spectrum for the target speaker by using a CSP coefficient which is a cross-correlation between two channel signals. In other situations, a binary mask can be used based on a direction of voice arrival. The prior art takes advantage of the gain control of a speech spectrum of a target speaker by using a CSP coefficient takes advantage of a characteristic that a CSP coefficient in a designated direction takes a large value when the target speaker located in the designated direction is speaking, and takes a small value when the target speaker is not speaking. Unfortunately, the CSP coefficient when used in a designated direction sometimes takes a relatively large value especially when a speaker that is different from the target speaker is located in the general designated direction where the target speech is to originate. In such a case, the speech spectrum extracted by gain control using a CSP coefficient contains the voice of a speaker that may be different than that of the intended target speaker. This degrades the reliability and accuracy of the entire system (automatic speech recognition). This is a shortcoming of both the weighted CSP approach and the binary mask approach and others used currently by available prior art.
Consequently, it is desirable to introduce a solution that can overcome the problems not currently addressed by the prior art.