1. Field
Apparatuses and methods consistent with exemplary embodiments relate to processing an audio signal, and more particularly, to processing an audio signal in which an audio signal is encoded, decoded, searched, or edited by using motion of a sound source, reverberation property, or semantic object, of which information is included in the audio signal.
2. Description of the Related Art
A method of compressing or encoding an audio signal may be classified into a transformation-based audio signal encoding method and a parameter-based audio signal encoding method. In the transformation-based audio signal encoding method, an audio signal is frequency-transformed, and frequency domain coefficients are encoded and compressed. In the parameter-based audio signal encoding method, all audio signals are grouped into three types of parameters, such as a tone signal, a noise signal, and a transient signal, and the three types of parameters are encoded and compressed.
However, the transformation-based audio signal encoding method processes a large amount of information, and uses separate metadata for controlling semantic media. In addition, in the parameter-based audio signal encoding method, connection with a high level semantic descriptor for controlling semantic media is difficult, audio signals to be expressed as noise have various kinds and wide ranges, and performing high-quality coding is difficult.
Active research has been conducted into multichannel (e.g., 22.2 ch) in an audio field in order to correspond to ultra definition (UD). Home audio systems have different configurations according to environments. Thus, there is a need to efficiently perform down-mixing on a multichannel audio signal according to a home audio system. When an audio signal generated by a moving sound source is down-mixed to have a lower number of channels than the generated audio signal, since speakers are spaced apart from each other, a sound generated by the moving sound source may not be smoothly expressed.
Research has been conducted into technologies in which a listener may listen to a stereoscopic sound by estimating position information about a sound source from an audio signal, distributing output to a plurality of speakers according to the position information, and outputting the audio signal accordingly. In this case, since the position information is estimated on the assumption that the sound source is fixed, only restrictive motion of the sound source may be expressed, and entire position information for each frame is included in the position information. Thus, an amount of data may be increased.
In addition, there is a need for technologies in which a listener may have sense of realism of a concert hall or a theater by using information about acoustic properties, i.e., the reverberation property of a space such as the concert hall or the theater, although the listener is not in the concert hall or the theater. However, when a new reverberation property is applied to an original audio signal, since another reverberation effect is added to the original audio signal although the original audio signal has a reverberation component, an original reverberation component may be interfered with by a new reverberation component.
To overcome this problem, research has been conducted into a method of estimating a reverberation component in an audio signal, dividing the audio signal into a component with the reverberation component and a component without reverberation component, and encoding and transmitting the audio signal. In this case, since it is difficult to correctly estimate the reverberation component from the audio signal, it is difficult to completely extract only a sound generated by a sound source, and thus interference between an original reverberation component and a new reverberation may not be completely removed.