In many cases where a video is provided to an end user, the data of the video had been encoded into a format that is different from the original format of the data of the video. In the broadcast TV industry, “encoding” typically means “compressing”, “video” refers only to visual data, and audio refers to data related to sound. Combined, visual data and audio data are referred to as audio/video data. A/V data is compressed into a first format by an encoder for transmission to an end user. Compression/encoding is typically done prior to data storage or transmission in order to reduce the amount of data that must be stored or transferred. At the end user, the compressed data is the decompressed to another format by a decoder.
The decoder must uncompress the A/V data and “present” it to a consumer such as a television (TV). For the A/V data to be displayed/heard properly, the decoder must recreate the original rates at which the data was encoded. In order to do this, the decoder relies on timing information embedded in the encoded data. For precise and glitch-free recovery by the decoder, the encoder must: (1) utilize a very stable and accurate timing reference to generate the embedded timing information; and (2) be “frequency locked” to the rate of the audio/video data being compressed. The encoder is not always able to perform these functions. The rate of the A/V data may occasionally deviate significantly from the encoder's timing embedder's requirements. Therefore, the A/V data must be transferred to a “clean” time domain prior to encoding. In other words, in many cases, the data of the video is provided at a first rate or a first clock signal, whereas the encoder is able to encode data at a second, different rate or a second clock signal. To complicate matters, the video data and audio data may additionally be provided at different clock signals and may be processed by the encoder with still different clock signals.
The process of transferring audio and video data from its source to an encoder is typically a complex process that involves synchronizing the audio and video data to respective clocks signals, while taking care to maintain audio/video (AV) synchronization between the frames in the video data and the audio signal. If there is no AV synchronization, then the audio data may be played at a time that is inconsistent with the video data that originally corresponded to the audio data. For example, when watching a video on television, if the sound does not synchronize with the image, the viewer may see a person's lips moving whereas the resulting sound (speech) does not match the lip movement. In order to maintain high performance and minimize bandwidth usage in the process of transferring audio and video data from its source to an encoder, this transfer process typically requires fairly complicated systems using costly components.
First of all, the video clock signal accompanying video data sent to an encoder is required to meet certain requirements (e.g. should be glitch-free, frequency must be within a certain range). Therefore, a clock signal synthesizer is typically used to generate a clock signal that is locked to the source video clock signal but meets these requirements. This is known as the time-base corrected (TBC) video clock signal. Using a frame buffer, the video data is then transferred from the domain of the source video clock signal to the TBC video clock signal so that it can be sent to the encoder.
Video data is typically accompanied by audio data, which may be embedded in the video data (known as ancillary audio) or may come from another source. Audio data that is not embedded in the video data, i.e., ancillary audio data, is required to be in the source audio clock domain before it can be encoded. Ancillary audio data needs to be extracted and transferred to a source audio clock domain. However, since the video data sent to the encoder is in the domain of the TBC video clock signal, the audio data must also be sent using a clock signal derived from the TBC video clock signal. Thus, it is necessary to transfer the audio data to a domain of the TBC audio clock signal. Thus, in the typical conventional process of transferring video and audio data from a source to an encoder, there are the following four clock signals: 1) the source video clock signal; 2) the source audio clock signal, consisting of an audio clock signal derived from the source video clock signal; 3) the TBC video clock signal; and 4) the TBC audio clock signal, consisting of an audio clock signal derived from TBC video clock signal.
Transferring audio data to the domain of the TBC audio clock signal, while maintaining A/V synchronization, can be a fairly complicated process, due to various considerations and limitations. One possible method is to simply re-sample the audio data in the domain of the TBC audio clock signal. However, this solution is not versatile as it is can only be implemented for uncompressed audio data, and not for pre-compressed audio data.
Another possible solution is to allow the ancillary audio data to be written to the video frame buffer along with video data, such that the audio data as well as the video data is transferred to the domain of the TBC video clock signal. However, this has the disadvantage of increasing the memory bandwidth utilization. Furthermore, this solution is also not versatile since it can only be used for embedded audio data, which has a known timing relation to the corresponding video data. Audio data from external sources is required to have a known timing relation with the video data as mentioned above. This approach is not feasible for audio data from an external source because it would complicate the frame buffer design to account for writing audio data, in addition to video data, to the frame buffer. Further, as with ancillary audio data, memory bandwidth utilization will increase.
With all these considerations, it is apparent that a versatile method for the clock signal transfer of the audio data must support both uncompressed and compressed audio data, and both embedded (ancillary) and external audio data. Furthermore, it must be able to maintain AV synchronization. The most common approach used to meet these requirements involves first transferring the audio data to a domain of the source audio clock signal, and then transferring that audio data to the encoder while providing the TBC video and audio clock signals to the encoder as a reference. The encoder must then manage the clock domain transfer of the audio data in its own buffers, while constantly exchanging frame buffer status with the frame buffer in order to maintain AV synchronization. An example of this approach will now be discussed with reference to FIG. 1.
FIG. 1 is a schematic illustrating a conventional system 100 for transferring audio data from a domain of the source clock signal to a domain of the TBC audio clock signal.
Conventional system 100 includes a source clock synthesizer 102, a field programmable gate array (FPGA) 104, a double data rate synchronous dynamic random access memory (DDR2 SDRAM) 106, a numerically controlled oscillator (NCO) 108, a video clock synthesizer 110, an audio clock synthesizer 112 and an encoder 114.
FPGA 104 includes a DDR2 controller 146, an audio de-embedder 118, a first-in-first-out (FIFO) buffer 120 and an NCO controller 148. Encoder 114 includes an audio data buffer 124. DDR2 106 and DDR2 controller 146 together may be considered a frame synchronizer and buffer 116. NCO 108 and NCO controller 148 together may be considered a TBC clock synthesizer 122.
Note that in this embodiment, frame synchronizer and buffer 116 includes a portion external to FPGA 104 (DDR2 106, the “buffer” portion) as well as a portion implemented within FPGA 104 (DDR2 controller 146, the “frame synchronizer” or “controller” portion). Similarly, TBC clock synthesizer 122 includes a portion external to FPGA 104 (NCO 108) as well as a portion implemented within FPGA 104 (NCO controller 148, which may include clock signal synthesis components such as a phase comparator and loop filter).
Source clock synthesizer 102 is arranged to receive source video clock signal 126 and to output a source audio clock signal 130. Audio de-embedder 118 is arranged to receive source video data 128 and to output audio data 134. FIFO buffer 120 is arranged to receive audio data 134, source video clock signal 126 and source audio clock signal 130 and to output audio data 136. TBC clock synthesizer 122 is arranged to provide reference clock signal 138. Video clock synthesizer 110 is arranged to receive reference clock signal 138 and to output TBC video clock signal 140. Audio clock synthesizer 112 is arranged to receive reference clock signal 138 and to output TBC audio clock signal 142. Frame synchronizer and buffer 116 is arranged to receive source video clock signal 126, source video data 128 and TBC video clock signal 140 and to output TBC video data 132 and frame sync status 144. Encoder 114 is arranged to receive TBC video data 132, frame sync status 144, audio data 136, source audio clock signal 130, TBC video clock signal 140 and TBC audio clock signal 142 and to output frame sync status 144. Audio data buffer 124 is arranged to receive audio data 136.
Source video data 128 includes portions of video data and portions of audio data. Source video data 128 is provided by source video clock signal 126. In order for encoder 114 to be able to encode source video data 128 for transmission, source video data 128 must be provided to encoder 114 at a TBC clock signal speed. In many cases, source video clock signal 126 is not a TBC clock signal speed. Accordingly, frame synchronizer and buffer 116 is operable to synchronize and buffer frames of the video data of source video data 128. In other words, video data of source video data 128 is written into frame synchronizer and buffer 116 using source video clock signal 126. The video data of source video data 128 will then be read from frame synchronizer and buffer 116 as TBC video data 132 using TBC clock signal 140.
Audio de-embedder 118 is operable to strip out the portions of audio data from source video data 128 and provide those portions to FIFO buffer 120 as audio data 134. Audio data 134 is written into FIFO buffer 120 with source video clock signal 126. Audio data 136 is read from FIFO buffer 120 with source audio clock signal 130.
As discussed above, video and audio data are written into their respective buffers with the same write clock signal, but are read from their respective buffers with different clock signals. This is a source of problems with the conventional system.
In particular, audio data 136 is read from FIFO buffer 120 using source audio clock signal 130, which is based on source video clock signal 126. If there is a problem with source video clock signal 126, then there will be a problem with source audio clock signal 130. In such a case, there will be a problem reading audio data 136 from FIFO buffer 120, but TBC video data 132 will still be read from frame synchronizer and buffer 116 with TBC video clock signal 140. In this situation, encoder 114 will recognize, by way of frame sync status 144, that audio data 136 does not synchronize with TBC video data 132 and will adjust the amount of audio data buffered in audio data buffer 124 to compensate.
In many cases A/V data has different amounts of video data than audio data (in most cases there is much more video data than audio data). To account for the disparity in the types of data, encoder 114 will encode the video data and the audio data at different rates, which are phase-locked. Video clock synthesizer 110 generates TBC video clock signal 140 from reference clock signal 138. Similarly, audio clock synthesizer 112 generates TBC audio clock signal 142 from reference clock signal 138. Video clock synthesizer 110 and audio clock synthesizer 112 are set such that TBC video clock signal 140 and TBC audio clock signal 142 meet the requirements of encoder 114 for encoding AV data in accordance with the predetermined coding scheme.
TBC video data 132 is written into encoder 114 by way of TBC video clock signal 140. Audio data 136 is written into audio data buffer 124 by way of source audio clock signal 130. Audio data 136 is read from audio data buffer 124 by way of TBC audio clock signal 142.
Video clock synthesizer 110 generates TBC video clock signal 140 and audio clock synthesizer 112 generates TBC audio clock signal 142 based on reference clock signal 138. Therefore TBC video clock signal 140 and TBC audio clock signal 142 are of the same domain. Encoder 114 uses TBC video clock signal 140 to write TBC video data 132 for encoding. Encoder 114 uses TBC audio clock signal 142 to write audio data from audio data buffer 124 for encoding.
An example method 200 for the operation of conventional system 100 will now be described with reference to FIG. 2.
In operation, process 200 starts (S202) and source video clock synthesizer 102 receives source video clock signal 126 and produces source audio clock signal 130 (S204).
Source video data 128 for encoding is additionally supplied to the video and audio buffers (S206). Audio de-embedder 118 receives source video data 128, which includes video data portions and audio data portions, and extracts the audio data portions as audio data 134. Audio de-embedder 118 then provides audio data 134 to FIFO buffer 120. Source video data 128 is concurrently provided to frame synchronizer and buffer 116.
At this point, source video data 128 is then written to the video and audio buffers (S208). Source video clock signal 126 enables source video data 128 to be written into frame synchronizer and buffer 116 and additionally enables audio data 134 to be written into FIFO buffer 120.
Audio data is then supplied to the encoder (S210). Audio data 136 is read from FIFO buffer 120 using source audio clock signal 130. Audio data 136 is then provided to audio data buffer 124 within encoder 114.
TBC video data 132 is then supplied to encoder 114 (S212). TBC video data 132 is read from frame synchronizer and buffer 116 using TBC video clock signal 140. TBC video data 132 is then provided to encoder 114.
At this point, TBC video clock signal 140 writes TBC video data 132 into encoder 114 while TBC audio clock signal 142 writes audio data 136 from audio data buffer 124 into encoder 114 (S212). Audio data buffer 124 may full up if too much audio data is provided for a corresponding portion of video data. This may occur when data is read from frame synchronizer and buffer 116 at a rate that is slower than the required rate for the data that is read from FIFO buffer 120. In other words, if source audio clock 130 is not synchronized with TBC video clock 140, audio 136 may be read into audio data buffer 124 at a much higher rate than TBC video data 132 is read into Encoder 114. This situation may cause audio data buffer 124 to fill up. To account for this situation, during step S212, encoder 114 constantly exchanges frame buffer status (via frame sync status 144) with frame synchronizer and buffer 116, in order to maintain AV synchronization.
Encoder 114 then encodes TBC video data 132 and audio data 136 (S214) in accordance with a predetermined coding scheme and process 200 stops (S216).
The problem with conventional system 100 (and corresponding process 200) is that it typically requires costly components, and also involves fairly complicated design and debugging efforts. Specifically, source audio clock synthesizer 102, which is required to generate source audio clock signal 130 from source video clock signal 126 (S204), might not accurately lock with source video clock signal 126. This may case large swings, or even overflow, in data storage within FIFO buffer 120. Further, reading audio data 136 into audio data buffer 124 with source audio clock signal 130 might not accurately correspond the reading of TBC video data 132 into encoder 114 with TBC video clock signal 140. This may case large swings, or even overflow, in audio data buffer 124. Overflow in data storage within FIFO buffer 120 or within audio data buffer 124 may disrupt AV synchronization. To avoid this issue in the conventional method, a very significant amount of design, integration and debugging of resources may be required.
What is needed is a system and method that can perform the process of transferring audio data from a domain of the source clock signal to a domain of the TBC audio clock signal while preserving A/V synchronization in a simple, cost-effective manner, thereby providing significant cost and design time reduction benefits.