During a quite long time period in the past, speech signal coding is relatively independent from non-speech signal (for example, music) coding, that is, speech signal coding is implemented by a dedicated speech coder, and non-speech signal coding is implemented by a dedicated non-speech coder (where the non-speech coder may also be referred to as a generic audio coder).
Generally, a speech coder is not used to code a non-speech signal, and a non-speech coder is not used to code a speech signal, not only because speech coding is relatively independent from non-speech signal coding in a coding theory, but also because the two types of signals are relatively independent in an actual application generally. For example, in a voice communications network, because during a quite long time period in the past, voices are all or main signal sources, and bandwidth is strictly limited, various speech coders with low rates are widely used in the voice communications network. In applications such as video and entertainment, because non-speech signals are a majority of signal sources and these applications impose a relatively high requirement on audio quality and a relatively low requirement on a bit rate, non-speech coders are widely used in these scenarios.
In recent years, increasing multimedia signal sources such as a customized ring back tone appear in a conventional voice communications network, which imposes a higher requirement on coding quality of a coder. A dedicated speech coder cannot provide relatively high coding quality required by these multimedia signals, and a new coding technology such as a mix-audio coder emerges as the times require.
The mix-audio coder is an audio coder that includes a sub-coder suitable for coding a speech signal and that further includes a sub-coder suitable for coding a non-speech signal. The mix-audio coder always attempts to dynamically select a most suitable sub-coder from all sub-coders to code an input audio signal. How to select the most suitable sub-coder from all the sub-coders to code an input current audio frame is an important function and requirement of the mix coder, and sub-coder selection is also referred to as mode selection, which directly relates to coding quality of the mix coder.
In the prior art, a sub-coder is generally selected in a closed-loop mode, that is, each sub-coder is used to code an input current audio fame once, and an optimal sub-coder is selected by directly comparing quality of the coded current audio frame. However, a disadvantage of the closed-loop mode is that coding operation complexity is relatively high (because each sub-coder is used to code the input current audio frame once), and further actual overheads of audio coding are relatively large.