Art for encoding a speech signal that is compressed with a low bit rate is important for the effective use of radio waves and the like in mobile communications. In recent years, increasing demands have been placed on speech quality, and there has been a desire to achieve a telephone service having a wide signal bandwidth and a good realistic effect.
The G726 and G729 standards, established by the ITU-T (International Telecommunication Union Telecommunication Standardization Sector) exist as speech signal encoding systems. These systems handle narrowband (300 Hz to 3.4 kHz) signals (hereinafter referred to as NB signals), and perform encoding at a bit rate from 8 kbit/s to 32 kbit/s. Because the narrowband signals that are handled have a maximum frequency bandwidth of 3.4 kHz, although there is no problem with intelligibility, the sound quality is muffled and lacking in realistic effect.
ITU-T and 3GPP (The 3rd Generation Partnership Project) have standard systems (for example, G.722 and AMR-WB) which encode a wideband signal (hereinafter referred to as a WB signal) having a signal bandwidth of 50 Hz to 7 kHz. These systems have a bit rate of 6.6 kbit/s to 64 kbit/s, and can encode a wideband signal. Although compared with a narrowband signal, a wideband signal has better sound quality; it is still not a sufficient sound quality for a telephone service that demands a highly realistic effect.
In contrast, although conventional circuit switching systems have achieved speech communication, because they occupied a circuit, they have been inefficient. For this reason, there have appeared systems that seek to use a communication path effectively by packetizing encoded data and transmitting the data using an IP (Internet Protocol) network. In particular systems that apply this art to speech communications are called VoIP (Voice over IP) systems. In mobile communications, VoIP is used in, for example, the 3GPP LTE (Long-Term Evolution) communication system.
For example, in the case of applying AMR-WB to VoIP, the AMR-WB encoded data is transmitted on the IP network as a RTP (real-time transport protocol) packet payload. When this is done, the size of the payload is described as bit rate information in the FT (Frame Type) field of the header that is a part of the RTP payload. The header of the RTP payload is set forth in Non-Patent Literature 1 and Non-Patent Literature 2.
Some systems have been proposed to achieve speech communication with a highly realistic effect by encoding a superwideband (50 Hz to 14 kHz) signal (hereinafter referred to as an SWB signal). For example, the G.718 Annex B (Non-Patent Literature 3, hereinafter referred to as G.718B) system established as a standard by the ITU-T can encode an SWB signal at a bit rate of 28 kbit/s to 48 kbit/s. The G.718B has a layered structure including a plurality of layers, and can encode a low-region signal (50 Hz to 7 kHz) at the two bit rates of 24 kbit/s or 32 kbit/s, and can encode a high-region signal (7 kHz to 14 kHz) at the three bit rates of 4 kbit/s, 8 kbit/s, and 16 kbit/s.
FIG. 1 is a drawing that shows the correspondence between the bit rate modes that can be used in the case of G.718B and the combinations of the low-region bit rate (hereinafter referred to as the low-region encoding rate) and the high-region bit rate (hereinafter referred to as the high-region encoding rate). As shown in FIG. 1, G.718B can encode an SWB signal with any of the bit rate modes of the five bit rate modes.