1. Field
One or more embodiments relate to a method, medium, and apparatus extracting a target sound from mixed sound, and more particularly, to a method, medium, and apparatus processing mixed sound, which contains various sounds generated by a plurality of sound sources and is input to a portable digital device that may process or capture sounds, such as a cellular phone, a camcorder or a digital recorder, to extract only a target sound desired by a user out of the mixed sound.
2. Description of the Related Art
Part of everyday life involves making or receiving phone calls, recording external sounds, and capturing moving images by using portable digital devices. Various digital devices, such as consumer electronics (CE) devices and cellular phones, use a microphone to capture sound. Generally, a microphone array including a plurality of microphones is utilized to implement stereophonic sound which uses two or more channels as contrasted with monophonic sound which uses only a single channel.
The microphone array including microphones may acquire not only a sound itself but also additional information regarding directivity of the sound, such as the direction or position of the sound. Directivity is a feature that increases or decreases the sensitivity to a sound signal transmitted from a sound source, which is located in a particular direction, by using the difference in the arrival times of the sound signal at each microphone of the microphone array. When sound signals are obtained using the microphone array, a sound signal coming from a particular direction may be emphasized or suppressed.
Research has been conducted regarding a method of removing musical noise or noise caused by a rapid change in an ambient environment when obtaining a mixed signal containing target sound and interference noise by using the microphone array and performing filtering in order to extract a target sound signal from the mixed signal. The International Telecommunication Union (ITU) has used the perceptual evaluation speech quality (PESQ) index indicating the quality of sound being objectively evaluated based on a comparison of input sound and output sound.
As used herein, the term “sound source” denotes a source which radiates sounds, that is, an individual speaker included in a speaker array. In addition, the term “sound field” denotes a virtual region formed by a sound which is radiated from a sound source, that is, a region which sound energy reaches. The term “sound pressure” denotes the power of sound energy which is represented using the physical quantity of pressure.