1. Field of the Invention
The present invention relates generally to speech and audio signal processing. More particularly, the present invention relates to complexity resource management for multiple channel speech and audio signal processing.
2. Related Art
In recent years, packet-based networks, such as the Internet, have begun to replace traditional telephone networks (i.e., switched networks) for transportation of voice and data in accordance with voice-over-packet (xe2x80x9cVoPxe2x80x9d). The packetizing of voice signals for transmission over a packet network has been recognized as a less expensive, yet effective, alternative to traditional telephone service. For example, with the emergence of voice over IP (xe2x80x9cVoIPxe2x80x9d), telephone conversations may now be captured, packetized and transported over the Internet. Other examples of emerging VoP implementations include Next Generation Networks (xe2x80x9cNGNxe2x80x9d), which do not necessarily use the Internet Protocol (IP) for the transmission of packet voice.
In a conventional VoIP system, telephone conversations or analog voice may be transported over the local loop or the public switched telephone network (xe2x80x9cPSTNxe2x80x9d) to the central office (xe2x80x9cCOxe2x80x9d), where speech is digitized according to an existing protocol, such as G.711. From the CO, the digitized speech is transported to a gateway device at the edge of the packet-based network. The gateway device receives the digital speech and packetizes it. The gateway device can combine G.711 samples into a packet, or use any other compressing scheme. Next, the packetized data is transmitted over the Internet using the Internet Protocol for reception by a remote gateway device and conversion back to analog voice in the reverse manner as described above.
For purposes of this application, the terms xe2x80x9cspeech coderxe2x80x9d or xe2x80x9cspeech processorxe2x80x9d will generally be used to describe the operation of a device that is capable of encoding speech for transmission over a packet-based network and/or decoding encoded speech received over the packet-based network. As noted above, the speech coder or speech processor may be implemented in a gateway device for conversion of speech samples into a packetized form that can be transmitted over a packet network and/or conversion of the packetized speech into speech samples. Ordinarily, a gateway processor handles the speech coding of multiple channels.
Efforts have been made to increase the efficiency and operation of speech processors to encode speech for transmission over packet-based networks. One area of development has been in the area of speech codecs. For example, recent speech codecs, such as the adaptive multi-rate (AMR), the enhanced variable rate speech coder (EVRC), and the selectable mode vocoder (SMV), have been designed for a best tradeoff between bit-rate, complexity and quality for their designed applications. In order to provider better playback quality at a lower bit-rate, these modern codecs are generally more complex and therefore require more processing power than lower-complexity high-bit-rate speech codecs, such as G.711. As a result of the increased complexity of these codecs and the associated hardware requirements, the channel density (i.e., number of channels) that a speech processor (or gateway) can support is limited. Increasing the processing power of speech processors and gateways to handle higher complex codecs would involve a substantial increase in cost and investment. On the other hand, operating lower-complexity high-bit-rate codecs results in increased bit rate and reduced throughput over the communication channels. In addition, in accordance with certain communication standards, low-bit-rate complex coders are mandatory, and therefore use of lower complexity codecs is not possible.
Speech encoding algorithms executed by speech processors (and gateways) have also been enhanced to increase the efficiency and operation of the communication channel. In particular, variable rate codecs were introduced for packet networks, where the average load on the networks is an essential factor in their operation. According to these enhanced encoding algorithms, the bit rate used to encode a speech signal may be selected according to the input speech. For example, approximately fifty percent (50%) of conversational speech involves inactive speech (silence). Typically, higher complex encoders are used to encode active speech segments with a somewhat higher bit rate, while lower complexity encoders are used to process silence or background noise (inactive speech) segments at a lower bit rate. Although this solution is suitable for the network due to its performance being related to the average bit rate, the processing of these multi-channels of speech by a DSP is particularly challenging, since the throughput of a DSP is not defined by the average complexity, but by the maximum complexity. Although, on the average, a DSP may be able to handle all the channels, since at a given time some channels carry active speechxe2x80x94that need higher complexity algorithmxe2x80x94and others carry inactive speechxe2x80x94that need lower complexity algorithm, there may still be instances where a majority or all channels involve active speech and, thus, all such channels needing higher complexity algorithm, which together will exceed the available computation power of the DSP.
Accordingly, there is a need in the art for a speech coder apparatus and method, which overcomes these and other shortcomings of present implementations for encoding voice information into a packetized form that can be transmitted over a packet network.
In accordance with the purposes of the present invention as broadly described herein, there is provided a multi-channel speech processor for encoding speech for a packet network environment. In one illustrative aspect of the present invention, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of the coding, which is used by a signal-processing unit (SPU) to convert the speech signal into packet data. In some embodiments, the CRM may also be used to manage the decoding operation as well. In general, the CRM determines the level of complexity of the coding based on a calculated complexity budget, where the complexity budget is determined based on the time consumed to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor, and adjust the speech processor to meet the complexity budget, through its ability to signal the SPU to encode and/or decode a speech signal in a complexity reduced coding mode based on the calculated or consumed complexity budget.
For example, the speech processor may use the SMV codec to encode speech signals for a plurality of channels 1 through m. The SMV codec may provide four coding rates, each rate having an associated level of complexity including: a full rate, a half rate, a quarter rate, and an eighth rate, for example. It is possible that the SMV full rate, the quarter rate, and the eighth rate schemes are less complex than the SMV half rate scheme due to the more intense search required to execute the half rate scheme. In this example, the CRM may choose a coding rate for a given channel xe2x80x9cnxe2x80x9d, based on the time spent processing channel 1 through nxe2x88x921 and the available processing time left to process channels n through m. Thus, the CRM may select a lower level complexity rate (e.g., full rate, quarter rate, or eighth rate) to process a given speech signal channel n (or groups of channels xe2x80x9cn+oxe2x80x9d, where xe2x80x9cn+oxe2x80x9dxe2x89xa6m) where the calculated processing time left to process the remaining channels would not be sufficient to support a higher level complexity coding rate (e.g., SMV half-rate). It is noted that although described in terms of ordinal numbers n for channels 1 through m, the speech processor of the present invention may actually process speech signals for channels 1 through m in any order as input signals arrives. It would also be readily apparent to one skilled in the art having the benefit of this disclosure that other speech codecs having coding rates of various complexity can also benefit from the CRM.
In accordance with other embodiments, the CRM is configured to signal the SPU to encode a speech signal based on a complexity level, rather than a specific rate. For example, the CRM may signal the SPU to switch to a higher or lower complexity algorithm, or to use a higher or lower complexity path in a particular algorithm, based on the complexity budget.
Typically the speech processor also executes a speech encoder algorithm for the common processing of channel speech signals, generally executed in conjunction with the CRM by the controller or implemented as a component of the CRM. As noted above, the encoder algorithm may be used to define the appropriate complexity coding rates corresponding to active speech segments and inactive speech segments, for example. When the CRM defines a lower complexity coding rate than the encoder algorithm in accordance with the complexity budget, the coding rate selected by the CRM overrides the rate selected by the encoder algorithm as is used by the SPU. Where the CRM does not define a coding rate (e.g., where the complexity budget would allow the remaining channels to be processed at the highest complexity rate) or where the complexity coding rate selected by the encoder algorithm is of less complexity than that defined by the CRM, the coding rate selected by the encoder algorithm is used by the SPU to process a given speech signal.
It is noted that the calculation of the overall complexity budget may also take into account the processing power consumed by other common processes (e.g., tone detection, echo cancellation).
In certain embodiments, the CRM may calculate the complexity budget based on groups of channels processed. For example, suppose the speech processor is capable of interfacing with sixty (60) communication channels. In this 60-channel example, the CRM may evaluate the complexity budget in four (4) groups of fifteen (15) channels, six (6) groups of (10) channels, or other various arrangement of groups of channels. Thus, the complexity budget may be calculated after the first 15 channels have been processed to determine the complexity rate for the next 15 channels. Likewise, the complexity budget may be calculated after the first 30 channels have been processed to determine the complexity rate for the next 15 channels, and so on.
According to another aspect of the present invention, the speech processor may be used to support a variable number of channels. In this embodiment, the CRM may determine whether an additional requested channel may be supported based on the calculated complexity budget and/or in accordance with certain quality requirements. For example, where the CRM determines that the available processing time left is sufficient to process all currently accepted or active channels and the requested channel, the CRM may accept the requested channel for processing by the SPU. Otherwise, if the available processing time left is not sufficient to process all currently accepted or active channels as well as the requested channel, the CRM denies the requested channel. In other embodiments, the CRM may be configured to accept the requested channel only if the quality of output of the active channels would not be severely impacted or fall below a certain threshold.
Variable channel support may be implemented in a number of ways. In some embodiments, a pre-determined number of channels are supported. In this embodiment, the CRM will accept a channel if the pre-determined number of channels have not been exceeded (i.e., the CRM is currently managing fewer than the pre-determined number of channels). Otherwise, the CRM will reject the requested channel. In other embodiments, acceptance of a requested channel involves first determining whether the SPU is able to run without any complexity reduction (e.g., up to N channels). If so, the CRM does not operate, and any requested channel can be accepted until N channels have been accepted. For each requested channel above N channels, the CRM performs statistical complexity reduction analysis. For example, the CRM may determine the level of complexity reduction needed to accommodate the requested channel, and may accept/reject the requested channel based on whether a certain threshold of complexity reduction will be exceeded.
According to yet another aspect of the present invention, the speech processor may support multiple codecs which are stored in a memory coupled to both the controller and the SPU. According to this embodiment, multiple speech codecs (e.g., AMR, EVRC, SMV, G.711) may be supported by the speech processor to provide wider support of speech coders. In operation, the controller loads the coder which corresponds to the input speech signal into the SPU for processing the speech signal while the CRM may define the level of complexity for the particular coder as described above.
These and other aspects of the present invention will become apparent with further reference to the drawings and specification, which follow. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.