Network communications systems are often expected to support a large number of content streams in parallel. The content streams may, and often do, correspond to different voice sessions, e.g., calls. In some embodiments each voice session is communicated over a different channel which may be a virtual or physical channel. For purposes of encoding and decoding, data is often arranged into a set or unit corresponding to a frame time. Supporting of decoding and/or encoding operations for large numbers of communications sessions can be a resource intensive process particularly where real time communications sessions are involved since the decoding and/or encoding should be performed at a rate which allows the frame rate of the various communications sessions to be supported.
The processing of content steams and transcoding often involves at least some processing operations that do not easily lend themselves to parallel processing. While general purpose CPUs are well suited for serial processing operations, it can be costly to provide a CPU with processing power sufficient to process and sequentially transcode a large number of content streams in a single frame time. Increasing the number of CPUs, with each CPU supporting multiple content streams is one approach that may be taken to support the processing of large numbers of content streams. However, simply increasing the number of CPUs can be a costly proposition since CPUs capable of supporting large numbers content streams, e.g., voice calls, which are subject to coding and/or decoding can be expensive.
Graphics processing units (GPUs) have been developed for graphics applications, e.g., video operations. Such units often include a large number of processing cores often referred to as GPU cores. Each GPU core is capable of processing a unit of data in parallel with the other GPU cores. Thus, GPUs can support a large number of operations in parallel given that they have a large number of cores but the operations performed in parallel.
Because of the volume at which GPUs are produced, off the self GPU units tend to offer an excellent value in many cases for applications which lend themselves to parallel processing. Unfortunately, due to the sequential processing required for at least some portions of processing audio content streams, completely replacing regular CPUs with GPUs for purposes of processing and/or transcoding large numbers of audio content streams in a communications system is impractical in many cases given the time constraints in which the stream processing, e.g., audio decoding and/or encoding, needs to be performed, e.g., to support real time content streams such as those associated with voice calls that may involve transcoding.
In view of the above, it should be appreciated that there is a need for methods and/or apparatus which can be used to support processing of audio content streams, e.g., as part of communication through a communications network which supports real time audio communications.
For example, because of the high compute complexity of speech transcoding codecs, the scale achieved in Central Processing Unit (CPU) virtualized real-time speech transcoding service is limited. At the same time availability of Graphics Processing Units (GPUs) as a compute offload on Commercial Off The Shelf (COTS) hardware has increased given increasing demands from a diverse range of applications including Image Processing, Big Data, i.e., data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them, and Artificial Intelligence. The ability to leverage GPUs as compute offloads for virtualized CPU based real-time speech transcoding services is an attractive use case. However, there are currently problems with using GPUs to provide speech transcoding services.
Speech transcoding operates on fixed frame sizes. Given a codec, a multi-channel speech transcoding system has to ensure processing of all channels is completed within the codec's frame-time for stability. CPU based solutions process channels sequentially and hence their scale is limited by the number of channels that could be processed in the codec's frame time. While GPUs which are equipped with thousands of compute cores (GPU cores), offer an attractive possibility of compute offload for scale, leveraging them for speech transcoding offload has a number of challenges. First, GPU compute cores are less powerful than CPU counterparts, hence they are ill-suited for sequential processing. Second, speech codecs employ various types of recursive filter algorithms making them difficult to be made parallel for GPU processing. Third, even if parts of the speech transcode processing could be offloaded to GPU, the increase in scale is limited by the fraction of processing that has been offloaded. For example, even if 50% of the transcode processing is offloaded from the CPU to a GPU a speed-up of only 2 times would be achieved. Fourth, parts of the speech transcoding services, like media-plane network telemetry and control plane communications require low-latency processing. Latency introduced by the GPU processing would affect such operations.
Offloading of encryption and/or decryption packet processing from a CPU to a GPU is another application to which the present invention is applicable.
From the aforementioned discussion, it is apparent that there is a need for new and/or improved communications methods and apparatus that are more efficient and cost effective and can provide and/or effectuate encrypting, decrypting, encoding, decoding and/or transcoding with greater efficiency when scaled. Furthermore, there is a need for new and/or improved methods and apparatus that can utilize one or more GPU devices to provide encrypting, decrypting, encoding, decoding, and/or transcoding services. There is a need for new and/or improved communications methods and apparatus that utilize common off the shelf GPU devices to provide lower cost encrypting, decrypting, encoding, decoding, and/or transcoding services on a per session basis than the alternative of using CPU devices alone without GPU devices.