A coding scheme used for conversation using a mobile phone or the like is a scheme called Code-Excited Linear Prediction (CELP) Codec. More specifically, the coding scheme for use is a scheme for separating an input signal into a linear prediction coefficient and an excitation signal (which is a signal to be an input to a linear prediction filter using the linear prediction coefficient), and coding each of the data resulting from the separation. Examples of such a coding scheme include an adaptive multi-rate (AMR) scheme (see Non-patent Literature 1). This scheme performs modeling of an acoustic characteristic of a vocal tract using a linear prediction coefficient and performs modeling of vibration of a vocal band using an excitation signal. For this reason, it is possible to efficiently code speech signals, but it is impossible to efficiently code signals of natural sounds (audio signals) which are non-speech signals and thus for which no such modeling is performed.
On the other hand, examples of a coding scheme used for a digital television (TV), a Digital Versatile Disc (DVD), or a Blue-ray disc player include a scheme such as the Advanced Audio Coding (AAC) scheme (see Non-patent Literature 2). This scheme is a scheme for coding a raw frequency spectrum of an input signal. For this reason, this scheme cannot enable compression of a speech signal at a compression rate as high as a compression rate obtainable in the CELP Codec although this scheme can provide a natural sound (a non-speech audio signal) having a good sound quality.
This is described qualitatively using a graph of FIG. 11.
In the graph of FIG. 11, the horizontal axis shows bit rates in coding, and the vertical axis shows sound quality. The solid curve (data 73) shows the relationship between bit rates and sound quality in an audio codec such as AAC (in the case where a scheme for audio is used). A curve represented as an alternate long and short dash line (data 74S) shows the relationship between the bit rates and the sound quality in a speech codec such as AMR (in the case where a scheme for speech is used). A curve represented as a broken line shows the relationship between bit rates and sound quality in the case where a signal that is non-speech signal is processed according to a speech codec. Here, various kinds of units are considered to be appropriate for the horizontal axis and the vertical axis in the graph of FIG. 11. In other words, for example, such units may be considered as arbitrary units. More specifically, for example, the unit used for the vertical axis may indicate values evaluated using a human sense in an experiment. In addition, the unit used for the horizontal axis may indicate values represented using kbps (kilobit per second).
Here, a range 90 enclosed by a thin broken line in the vertical direction in the diagram shows the range of bit rates in which an appropriate coding unit is different depending on an input signal. A detailed description of bit rates is given later.
In the operation of standardizing the United Speech and Audio Codec (SAC) described in detail later, only the range 90 is focused on, and a range (range 91) other than the range 90 is not focused so much. Sound qualities depend on kinds of input signals (signals to be coded). Within the range 90, a speech codec can achieve a better sound quality (see data 74S and data 73) in the case where an input signal is a speech signal. On the other hand, within the range 90, an audio codec can achieve a better sound quality (see data 73 and data 74A) in the case where an input signal is a non-speech signal.
As such, in the recent activity for standardizing audio standards by MPEG, a consideration is given of a coding standard (the Unified Speech and Audio codec (USAC)) which enables efficient coding of both the speech signals and natural sounds (non-speech audio signals).
FIG. 9 shows a schematic block diagram of coding.
A plurality of blocks shown in the block diagram of FIG. 9 includes: an input signal classifying unit 500 which classifies input signals (signals to be coded) into a signal for which a speech codec is suitable or a signal for which an audio codec is suitable before coding the input signals; a high frequency signal coding unit 501 which codes high frequency components of the input signals; an audio signal coding unit 502; a speech signal coding unit 503; and a bit stream generating unit 504.
As shown in FIG. 9, the input signal classifying unit 500 classifies the input signals into the signal for which the speech codec is suitable or the signal for which the audio codec is suitable. After such classification is performed, each of the input signals is coded by a coding unit (an audio signal coding unit 502 or a speech signal coding unit 503) corresponding to the kind of the suitable one of the speech codec and the audio codec. Here, the high-frequency signal coding unit 501 prepared at a pre-stage performs coding of a Spectral Band Replication (SBR) technique (ISO/IEC11496-3) standardized by the Moving Picture Experts Group (MPEG), and thereby contributes to replication of a reproduction band at the time of decoding.
FIG. 10 shows a block diagram of decoding according to USAC.
A plurality of blocks shown in the block diagram of FIG. 10 includes: a bit stream separating unit 600 which separates a bit stream of an input into a coded signal; an audio signal decoding unit 601; a speech signal decoding unit 602; and a band replicating unit 603 which replicates a reproduction band of a signal decoded by one of the decoding units.
As shown in FIG. 10, the bit stream of the input is separated into the coded signal by the bit stream separating unit 600. In the case where the coded signal is classified as a coded signal of an audio signal, the coded signal is processed by the audio signal decoding unit 601. In the opposite case where the coded signal is classified as a coded signal of a speech signal, the coded signal is processed by the speech signal decoding unit 602. In this way, a Pulse Code Modulation (PCM) signal is generated. The decoded signal in any one of the cases is subjected to a reproduction band replication process performed by the band replicating unit 603.