In voice communications on an IP (Internet Protocol) network or radio communication network, voice data may not be able to be received on the receiving side, or may be received containing errors, due to IP packet loss, radio transmission errors, or the like. Therefore, in voice communication systems, processing is generally performed to conceal erroneous or lost voice data.
On the transmitting side of a typical voice communication system—that is, in a voice data transmitting apparatus—a voice signal constituting an input original signal is coded as voice data, multiplexed (packetized), and transmitted to a destination apparatus. Normally, multiplexing is performed with one voice frame as one transmission unit. With regard to multiplexing, Non-patent Document 1, for example, stipulates an IP packet network voice data format for 3GPP (The 3rd Generation Partnership Project) standard voice codec methods AMR (Adaptive Multi-Rate) and AMR-WB (Adaptive Multi-Rate Wideband).
On the receiving side—that is, in a voice data receiving apparatus—if there is loss or an error in received voice data, the voice signal in a lost or erroneous voice frame is restored by means of concealment processing using, for example, voice data (coded data) in a voice frame received in the past or a decoded voice signal decoded by using the voice data. With regard to voice frame concealment processing, Non-patent Document 2, for example, discloses an AMR frame concealment method.
Voice processing operations in an above-described voice communication system will now be outlined using FIG. 1. The sequence numbers ( . . . , n−2, n−1, n, n+1, N+2, . . . ) in FIG. 1 are frame numbers assigned to individual voice frames. On the receiving side, this frame number order is followed in decoding a voice signal and outputting decoded voice as a sound wave. Also, as shown in the same figure, coding, multiplexing, transmission, separation, and decoding are performed on an individual voice frame basis. For example, if frame n is lost, a voice frame received in the past (for example, frame n−1 or frame n−2) is referenced, and frame concealment processing is performed for frame n.
With the increasing use of broadband networks and multimedia communications in recent years, there has been a trend of higher voice quality in voice communications. As part of this trend, there is a demand for voice signals to be coded and transmitted not as monaural signals but as stereo signals. With regard to this demand, Non-patent Document 1 includes stipulations concerning multiplexing when voice data is multi-channel data (for example, stereo voice data). According to this document, when voice data is 2-channel data, for example, left-channel (L-ch) voice data and right-channel (R-ch) voice data corresponding to the same time are multiplexed.    Non-patent Document 1: “Real-Time Transfer Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs”, IETF RFC3267    Non-patent Document 2: “Mandatory Speech Codec speech processing functions; AMR Speech Codecs; Error concealment of lost frames”, 3rd Generation Partnership Project, TS26.091