Content delivery in a multi-media communication session includes communication of audio and video content streams from a sender such as a content delivery server to one or more end devices. Typically in the content delivery path between the sender to the end device, the audio and video content streams may be sampled, transcoded, transrated and/or subjected to further processing at one or more points by devices.
Transcoding refers to the functionality of changing the format of a media stream encoded according to a certain codec to encoding according to some other codec. Transcoding is applicable to both audio and video media types. There could be scenarios, where transcoding is needed for both audio and video streams of a communication session and where the transcoding occurs in different devices for the audio content stream then for video content stream.
Like various other processing operations, transcoding operations introduce latency and/or delays into the content stream being transcoded. Transcoding between two codecs introduce a delay causing a distortion in the end-to-end latency. Delays introduced into a content stream subject to transcoding may depend on a number of factors such as for example the type of content stream, type of transcoding device, etc. While multimedia content may include both audio and video content, the video and audio coding/compression formats are different for audio and video packets which include the audio and video content respectively. During delivery of content from a source to a destination end device, the audio and video content streams may be subject to one or multiple transcoding operations. While each transcoding operation may introduce some delay into the content stream being processed, many times the delay introduced into an audio content stream by an audio transcoder is different from the delay introduced into a video content stream by a video transcoder corresponding to the same multimedia session to which the audio content steam corresponds. This results in the two content streams corresponding to the same multimedia session being out of synchronization, that is, the audio content stream has a different latency introduced into it than the latency introduced into the video content stream.
Lip synchronization is the functionality of aligning audio and video content streams, e.g., audio and video, so that the end user is presented a coherent view. Generally, at a high conceptual level, there are two lip synchronization approaches: i) deducing a relationship between the timestamps and arrival times of RTP packets for each stream separately and then using this information to synchronize the streams; and ii) calculating the wall clock time for each RTP packet and synchronizing packets on both streams. This second approach requires use of Real-time Transport Protocol (RTCP) and a common wall clock for the devices performing audio/video transcoding.
Both methods of synchronization would benefit if the end-to-end latency for the two streams is close. For the first approach, having close end-to-end latency for the two streams is essential as arrival time is a key component of the procedure. For the second approach, differences in latency may cause some packets being played without waiting for their counterpart(s) on the other stream which results incoherent non synchronized presentation of content. Thus achieving close end-to-end latency is highly desirable.
In some cases while one of the content streams may be transcoded the other content stream may not need transcoding prior to delivery to the receiving device thus again resulting in different latencies for the two content streams. Some attempts to address the problem of unmatched latencies for the two streams have included introducing an artificial delay, at the transcoder node, into the content stream not being transcoded by the given transcoder thereby attempting to synchronize the non-transcoded stream with a content stream transcoded by the transcoder node. Since transcoding introduces delay into the transcoded stream, if an artificial delay of equal amount could be introduced into the non-transcoded stream this would theoretically make them equally delayed from the perspective of the transcoder node. FIG. 1A illustrates a drawing 100 which shows audio and video content streams communicated from a sender 102 to a receiver 120. In the illustrated drawing while the audio content stream is subject to a transcoding operation by the transcoder 110, the video content stream is communicated without a transcoding operation. Here transcoding the audio stream introduces transcoding latency into the audio stream which in combination with network latency adds up to a total of 90 ms latency being added to the audio stream. Since the network latency which is 50 ms in this example is the same for both audio and video streams, the total latency for the video stream which is not subjected to transcoding is only 50 ms. The problem of different delays for different streams can be appreciated from drawing 100 wherein it is shown that transcoding one of the content streams by not the other causes the two streams to get out of synchronization and have different latencies.
FIG. 1B is a drawing 150 that illustrates one approach used to address the problem of mismatched latencies caused due to one of the content streams being transcoded as discussed above with regard to FIG. 1A. In the approach illustrated by drawing 150 the distortion in the end to end latency is addressed by introducing artificial latency at the transcoding device into the non-transcoded stream so that the total latency of the non-transcoded stream matches that of the transcoded stream. As shown in drawing 150 the audio content stream is transcoded by the transcoder 110 thereby adding 40 ms latency to the audio content stream which already includes an average network latency of 50 ms resulting in a total latency of 90 ms for the audio content stream. To address the effect of latency mismatch and distortions the transcoder 110 introduces artificial latency into the non-transcoded video stream thereby causing the total latency for the video stream to be 90 ms as well.
While such an approach as discussed with regard to FIG. 1B achieves stream synchronization from the perspective of transcoder node, this approach has issues if the two content streams are transcoded by different transcoding entities. Both transcoding entities may be adding some artificial latency to the stream which is not transcoded from their perspective causing the total cumulative latency for each stream to increase significantly. This problem is illustrated in FIG. 2 which shows how the aggregate latency for each stream is increased when the content streams are transcoded by different transcoding entities.
FIG. 2 illustrates a system 200 that includes the same or similar system elements as shown in FIGS. 1A and 1B in addition to an additional transcoding device. The system elements that are the same have been identified by the same reference numbers as used earlier. The system operates in a similar manner as discussed in FIG. 1B except that in system 200 the two content streams are transcoded by different transcoding entities with the audio content stream being transcoded by Transcoder-1 112 and the video content stream being transcoded by Transcoder-2 114. To eliminate the latency distortion from the perspective of an individual transcoder, each transcoder adds of an artificial latency to the non-transcoded stream, i.e., the content stream that the individual transcoder does not transcode. As can be appreciated from FIG. 2, Transcoder-1 112, e.g., an audio transcoder, transcodes the audio content stream and introduces a transcoding latency of 40 ms to it while adding an equal amount of 40 ms latency to the non-transcoded video stream in order to overcome latency distortions in the two streams from its perspective.
The two content streams next pass through Transcoder-2 114, e.g., video transcoder, which transcodes the video content stream and introduces a transcoding latency of 60 ms to it while adding an equal amount of 60 ms latency to the non-transcoded audio content stream in order to overcome latency distortions in the two streams from its perspective. As should be appreciated this results in a total aggregate latency of 100 ms for both the streams which is significantly greater compared to the individual transcoding latencies introduced in each stream due to transcoding. Thus it should be appreciated that the above discussed approaches suffer from drawbacks such as increasing the total aggregate delay for the content streams.
From the above discussion it should be appreciated that there is a need for methods and apparatus that eliminate or reduce the stream latency mismatch between content streams without significantly increasing the aggregate stream latencies corresponding to the content streams. There is a further need that such methods and apparatus minimize aggregate delays for the content streams.