In speech data communication over an IP (Internet Protocol) network, there are cases where coded data of different format types between a active speech section and inactive speech section is transmitted. “Active speech” represents that a speech signal contains speech components at a predetermined level or more. “Inactive speech” represents that a speech signal does not contain speech components at a predetermined level or more. When a speech signal contains only noise components different from speech components, this speech signal is recognized to be inactive speech. One such transmission technology includes DTX control (for example, refer to non-patent document 1 and non-patent document 2).
For example, when speech coding apparatus 10 shown in FIG. 1 carries out speech coding in a mode accompanying DTX control, at active speech/inactive speech determination section 11, whether or not a section is active speech or inactive speech is determined per section for speech signals divided per section of a predetermined length (corresponding to frame length). When active speech is determined—that is, in a case of an active speech section—coded data generated at speech coding section 12 is outputted from DTX control section 13 as an active speech frame. At this time, an active speech frame is outputted together with frame type information for reporting transmission of the active speech frame. An active speech frame has a format comprised of information for Nv bits, as shown, for example, in FIG. 2(A).
On the other hand, when inactive speech is determined—that is, in a case of an inactive speech section, inactive speech frame coding is carried out at comfortable noise coding section 14. Inactive speech frame coding is coding for obtaining a signal simulating ambient noise at an inactive speech section on a decoding side, and is coding carried out using a small amount of information—that is, a small number of bits—compared to an active speech section. Coded data generated as a result of inactive speech frame coding is outputted as a so-called SID (Silence Descriptor) frame from DTX control section 13 at a fixed period at consecutive inactive speech sections. At this time, an SID frame is outputted together with frame type information for reporting transmission of the SID frame. Further, an SID frame has a format comprised of information for Nuv bits (Nuv<Nv), as shown, for example, in FIG. 2(B).
Further, transmission of coded information is not carried out at times other than when SID frames are transmitted at an inactive speech section. In other words, transmission of inactive speech frames is omitted. However, frame type information for reporting transmission of an inactive speech frame alone is outputted from DTX control section 13. In this way, in DTX control, control is carried out so as to carry out discontinuous transmission, and an amount of information transmitted via a transmission path and an amount of information decoded on the decoding side is reduced at the inactive speech section.
Compared to this, when speech coding is carried out in a mode where DTX control is not carried out, a speech signal is always processed to be active speech, and as a result, transmission of coded data is always carried out in a consecutive manner. Therefore, with a speech coding apparatus of the related art having a DTX control function, a mode of speech coding is set in advance to a mode that is accompanied with DTX control (with DTX control) or a mode that is not accompanied with DTX control (without DTX), and speech coding is then carried out.    Non-Patent Document 1: “Mandatory speech CODEC speech processing functions; AMR speech CODEC; General description”, 3rd Generation Partnership Project, TS26.071    Non-Patent Document 2: “Mandatory speech codec speech processing functions Adaptive Multi-Rate (AMR) speech codec; Source controlled rate operation”, 3rd Generation Partnership Project, TS26.093