The present disclosure relates to a speech receiving apparatus and a speech receiving method.
With the increasing use of the internet, IP telephony devices based on voice over IP (VoIP) and voice over WiFi (VoWiFi) technologies have attracted have attracted considerable attention for speech communication.
In IP phone services, speech packets are typically transmitted using a real-time transport protocol/user datagram protocol (RTP/UDP). However, the RTP/UDP does not verify whether the transmitted packets are correctly received. Owing to the nature of this type of transmission, the packet loss rate increases with increasing network congestion. In addition, depending on the network resources, the possibility of burst packet losses also increases. Such a loss increase potentially results in severe quality degradation of the reconstructed speech.
Meanwhile, most speech coders in use today are based on telephone-bandwidth narrowband speech, nominally limited to about 300-3,400 Hz at a sampling rate of 8 kHz. Accordingly, the enhancement in speech quality is limited.
In contrast, wideband speech coders have been developed for the purpose of smoothly migrating from narrowband to wideband quality (50-7,000 Hz) at a sampling rate of 16 kHz in order to improve speech quality in voice service. For example, ITU-T Recommendation G.729.1, a scalable wideband speech coder, improves the quality of speech by encoding the frequency bands ignored by the narrowband speech coder, ITU-T G.729. Therefore, encoding wideband speech using ITU-T G.729 is performed via two different approaches according to the frequency band. Specifically, the two different approaches are applied to the low-band and high band in the time and frequency domains, respectively. As such a method, a method of coding information of high band at an upper layer of a transmission packet and transmitting the coded information is selected.
Meanwhile, an input frame may be erased due to a speech packet loss while speech is decoded, and the speech packet loss may occur due to various causes such as poor surroundings, etc. When a frame erasure occurs, the erased frame is reconstructed using a frame erasure concealment algorithm. For example, in ITU-T G.729.1, the low-band and high-band packet loss concealment (PLC) algorithms work separately. In detail, the low-band PLC algorithm reconstructs a speech signal of the lost frame from the excitation, pitch and linear prediction coefficient of the last good frame. On the other hand, the high-band PLC algorithm reconstructs the spectral parameters such as typically modified discrete cosine transform (MDCT) coefficients of the lost frame from the last good frame.
Meanwhile, when a frame erasure occurs, the signal reconstructed using the low-band PLC algorithm exhibits more enhanced performance than that reconstructed using the high-band PLC algorithm. Therefore, a method of improving a wideband speech signal with good quality by improving the quality of the high-band PLC algorithm is strongly required.