The present invention relates generally to a microphone array that tracks the direction of the voice of human speakers and, more specifically, to a hands-free mobile phone.
Mobile phones are commonly used in a car to provide the car driver a convenient telecommunication means. The user can use the phone while in the car without stopping the car or pulling the car over to a parking area. However, using a mobile phone while driving raises a safety issue because the driver must constantly adjust the position of the phone with one hand. This may distract the driver from paying attention to the driving.
A hands-free car phone system that uses a single microphone and a loudspeaker located at a distance from the driver can be a solution to the above-described problem, regarding the safety issue in driving. However, the speech quality of such a hands-free phone system is far inferior than the quality usually attainable from a phone with a handset supported by the user""s hand. The major disadvantages of using the above-described hands-free phone system arise from the fact that there is a considerable distance between the microphone and the user""s mouth and that the noise level in a moving car is usually high. The increase in the distance between the microphone and the user""s mouth drastically reduces the speech-to-ambient noise ratio. Moreover, the speech is severely reverberated and thus less natural and intelligible.
A hands-free system with several microphones, or a multi-microphone system, is able to improve the speech-to-ambient noise ratio and make the speech signal sound more natural without the need of bringing the microphone closer to the user""s mouth. This approach does not compromise the comfort and convenience of the user.
Speech enhancement in a multi-microphone system can be achieved by an analog or digital beamforming technique. The digital beamforming technique involves a beamformer that uses a plurality of digital filters to filter the electro-acoustic signals received from a plurality of microphones and the filtered signals are summed. The beamformer amplifies the microphone signals responsive to sound arriving from a certain direction and attenuates the signals arriving from other directions. In effect, the beamformer directs a beam of increased sensitivity towards the source in a selected direction in order to improve the signal-to-noise ratio of the microphone system. Ideally, the output signal of a multi-microphone system should sound similar to a microphone that is placed next to the user""s mouth.
Beamforming techniques are well-known. For example, the article entitled xe2x80x9cVoice source localization for automatic camera pointing system in videoconferencingxe2x80x9d, by H. Wang and P. Chu (Proceedings of IEEE 1997 Workshop on Applications of Signal Processing to Audio and Acoustics, 1997) discloses an algorithm for voice source localization. The major drawback of this voice source localization algorithm is that it is only applicable to a microphone system wherein the space between microphones is sufficiently large, 23 cm (9xe2x80x3) used in one direction and 30 cm (11.7xe2x80x3) used in the other direction. Moreover, the performance of the disclosed microphone system is not reliable in an environment where the ambient noise levels are high and reverberation is severe.
The article entitled xe2x80x9cA signal subspace tracking algorithm for microphone array processing of speechxe2x80x9d, by S. Affes and Y. Grenier (IEEE Transaction on Speech and Audio Processing, Vol.5, No.5, pp.425-437, September 1997) describes a method of adaptive microphone array beamforming using matched filters with subspace tracking. The performance of the system as described by S. Affes and Y. Grenier is also not reliable when the ambient noise levels and reverberation are high. Furthermore, this system only allows the user to move slightly, in a circle of about 10 cm (2.54xe2x80x3) radius. Thus, the above-described systems cannot reliably perform in an environment of a moving car where the ambient noise levels are usually high and there can be more than one human speaker who has a reasonable space to move around.
U.S. Pat. No. 4,741,038 (Elko et al) discloses a sound location arrangement wherein a plurality of electro-acoustical transducers are used to form a plurality of receiving beams to intercept sound from one or more specified directions. In the disclosed arrangement, at least one of the beams is steerable. The steerable beam can be used to scan a plurality of predetermined locations in order to compare the sound from those locations to the sound from a currently selected location.
The article entitled xe2x80x9cA self-steering digital microphone arrayxe2x80x9d, by W. Kellermann (Proceeding of ICASSP-91, pp. 3581-3584, 1991) discloses a method of selecting the beam direction by voting using a novel voting algorithm.
The article entitled xe2x80x9cAutodirective Microphone Systemxe2x80x9d by J. L. Flanagan et al (Acoustica, Vol. 73, pp.58-71, 1991) discloses a two-directional beamforming system for an auditorium wherein the microphone system is dynamically steered or pointed to a desired talker location.
However, the above-described systems are either too complicated or they are not designed to perform in an environment such as the interior of a moving car where the ambient noise levels are high and the human speakers in the car are allowed to move within a broader range. Furthermore, the above-described systems do not distinguish the voice from the near-end human speakers from the voice of the far-end human speakers through the loudspeaker of a hands-free phone system.
The first aspect of the present invention is to provide a system that uses a plurality of acoustic sensors for tracking at least one human speaker in order to effectively detect the voice from the human speaker, wherein the human speaker and the acoustic sensors are separated by a speaker distance along a speaker direction, and wherein the human speaker is allowed to move relative to the acoustic sensors resulting in a change in the speaker direction within an angular range, and wherein each acoustic sensor produces an electrical signal responsive to the voice of the human speaker. The system comprises: a) a beamformer operatively connected to the acoustic sensors to receive the electrical signal, wherein the beamformer is capable of forming N different beams, and each of the beams defines a favorable direction to detect the voice from the human speaker by the acoustic sensors and each different beam is directed in a substantially different direction within the angular range, and wherein the beamformer further outputs for each beam a beam power responsive to the voice detected by the acoustic sensors; and b) a comparator operatively connected to the beamformer for comparing the beam power of each beam in order to determine a most favorable direction to detect the voice of the human speaker, wherein the comparator compares the beam power of each beam periodically so as to determine the most favorable detection direction according to the change in the speaker direction.
The second aspect of the present invention is to provide a method of tracking at least one human speaker using a plurality of acoustic sensors in order to effectively detect the voice from the human speaker, wherein the human speaker and the acoustic sensors are separated by a speaker distance along a speaker direction, and wherein the human speaker is allowed to move relative to the acoustic sensors resulting in a change in the speaker direction within an angular range, and wherein each acoustic sensor produces an electrical signal responsive to the voice of the speaker. The method includes the steps of: a) forming N different beams from the electrical signal such that each beam defines a favorable direction to detect the voice of the human speaker by the acoustic sensors and each different beam is directed in a substantially different direction within the angular range, wherein each beam has a beam power responsive to the electrical signal; and b) periodically comparing the beam power of each beam in order to determine the most favorable direction to detect the voice of the human speaker according to the change of the speaker direction.
The present invention will become apparent upon reading the description taken in conjunction with FIG. 1 to FIG. 4.