1. Field of the Invention
The present invention relates generally to an apparatus and method for localizing a sound source in a robot, and more particularly, to an apparatus and method for enabling a miniaturized robot to rapidly and exactly localize a sound source in three-dimensional space with minimum dead space and using a minimum number of microphones.
2. Description of the Related Art
Utility robots that act as partners to human beings and assist in daily life, including various human activities outside of the home, are currently being developed. Unlike industrial robots, utility robots are built like human beings, move like human beings in human living environments, and thus are referred to as humanoid robots (herein referred to as “robots”).
In general, a robot walks with two legs (or moves using two wheels) and has a plurality of joints and drive motors, which drive the joints, to move its hands, arms, neck, legs, etc., like human beings. For example, 41 joint drive motors are installed in Hubo, a humanoid robot developed by Korea Advanced Institute of Science and Technology (KAIST) in December 2004, and drive respective joints.
Drive motors of a robot are generally separately controlled. To control the drive motors, a plurality of motor drivers, each of which control at least one of the drive motors, are installed in the robot and controlled by a control computer installed inside or outside of the robot.
As robots are developed to be more humanlike, technology has also been developed that enables users to communicate with the robots, for example, to issue verbal orders.
If a robot looks away from a user while the user is communicating with the robot, the user may not feel satisfied with the communication. Thus, the robot needs to localize the user, i.e., the sound source, in order to look in the direction of the user.
In general, sound source localization methods are classified into the following types:
1) Methods of localizing a sound source by maximizing steered power of a beamformer, 2) Methods of localizing a sound source on the basis of high-resolution spectrum estimation, and 3) Methods of localizing a sound source using difference in sound arrival times at a plurality of sensors, i.e., Time-Difference Of Arrivals (TDOAs) between sensors.
A representative method of localizing a sound source by maximizing steered power of a beamformer is a Steered Response Power (SRP) algorithm, which is described in detail in “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays” written by J. Dibiase and published in 2000.
A representative method of localizing a sound source on the basis of high-resolution spectrum estimation is a Multiple Signal Classification (MUSIC) algorithm, which is described in detail in “Adaptive Eigenvalue Decomposition Algorithm for Passive Acoustic Source Localization” written by J. Benesty and published in 2000.
A representative method of localizing a sound source using TDOAs between sensors is a Generalized Cross-Correlation (GCC) algorithm, which is described in detail in “The Generalized Correlation Method for Estimation of Time Delay” written by C. H. Knapp and G. C. Carter and published in 1976.
As one of the various algorithms for localizing a sound source, a GCC-Phase Transform (PHAT) algorithm, which is a GCC algorithm employing a PHAT filter, involves a relatively small amount of computation, and making it is possible to localize a sound source in real time. An SRP-PHAT algorithm, which is an SRP algorithm employing a PHAT filter, is a grid search method of dividing a whole space into blocks and localizing a sound source in each block. However, the SRP-PHAT algorithm involves a large amount of computation. Thus, the SRP-PHAT algorithm is difficult to use in real time but has better sound source localization performance than the GCC-PHAT algorithm.
The PHAT filter is described in detail in “Use of The Crosspower-Spectrum Phase in Acoustic Event Location” written by M. Omologo and P. Svaizer and published in 1997.
FIG. 1 illustrates a microphone array for localizing a sound source in three-dimensional space using the GCC-PHAT algorithm. As illustrated in FIG. 1, to localize a sound source in a three-dimensional space using the GCC-PHAT algorithm, at least eight microphones 10 must be arranged in the form of a cube, that is, at the corners of the cube.
More specifically, to localize a sound source in a three-dimensional space using the GCC-PHAT algorithm, the position of the sound source must be searched for in all directions (up, down, forward, backward, left and right) from the robot. Thus, the sound source is localized using TDOAs between the microphones 10 diagonally disposed in each square surface of the cube.
In a method of localizing a sound source in a three-dimensional space using the SRP-PHAT algorithm, the positions of the microphones 10 are unlimited.
As mentioned above, the SRP-PHAT algorithm divides the whole space in all directions from the robot into blocks, searches each block for a sound source, and thus involves a larger amount of computation than the GCC-PHAT algorithm. Thus, the SRP-PHAT algorithm is difficult to use to localize a sound source in real time but has excellent sound source localization performance in a three-dimensional space.
The general GCC-PHAT algorithm using the eight microphones 10 as illustrated in FIG. 1 can accurately localize a sound source in a three-dimensional space. However, since eight or more microphones are necessary, it is difficult to use the general GCC-PHAT algorithm in a miniaturized robot, such as a mini robot.
In order to apply the GCC-PHAT algorithm using the minimum number of microphones, four microphones 10 may be disposed in a plane as illustrated in FIG. 2. However, when the four microphones 10 are disposed in a rectangular form, a sound source to the front, back left or right can be localized but a sound source disposed above or below cannot. For a mini robot, this drawback is not a serious problem because of its small height. But the larger the robot and the higher the position of the microphones 10, the greater a dead space in which a sound source cannot be localized.
The method of localizing a sound source using the SRP-PHAT algorithm does not limit the positions of microphones and has better performance than the method using the GCC-PHAT algorithm. But the method using the SRP-PHAT algorithm involves too much computation to process in a real-time system, and thus, it is difficult to apply the method to a miniaturized robot.
The sound source localization method of a miniaturized robot must be able to minimize the number of microphones used, minimize a dead space in sound source direction estimation, and rapidly and accurately localize the sound source in three-dimensional space.