1. Field of the Invention
The present invention relates to video compression, and more particularly, to a transcoding system and method that can effectively synchronize segmentation metadata with AV contents when converting an MPEG stream from one bit rate to another, from one frame size to another, or from one compression format to another, storing or transmitting the converted stream, or reproducing the converted stream using a segment browser.
2. Description of the Related Art
A transcoding process is a process of converting a compressed video signal into another video signal having a different rate, a different video frame rate, a different video frame size, or a different compression type to the compressed video signal, and a transcoder is an apparatus that performs the transcoding process.
Due to the development of a variety of multimedia application programs and the improvement of communication environments, the demand for communications between different types of networks or protocols has been steadily growing. For example, in order to transmit a video stream between different types of networks, a special communication path should be set up between a video source and a user. In this case, a bandwidth for a compressed video stream is adjusted to be compatible with the severest (lowest) transmission rate provided in network connection when encoding source video.
The bandwidth of a real-time video stream is adjusted by changing the coding parameters of the source encoder. In this case, however, since the encoding bit rate should be lowered to be compatible with a link under the worst condition, poor picture quality may be resulted. In other words, even a system capable of processing higher quality images is likely to end up having low quality images. In current communication environments, in which a variety of user devices are used together, many problems may be caused due to the fact that transmission channels are likely to have different characteristics and different capabilities from one another.
Recently, an increasing number of users have had a preference for compact-sized portable devices, such as mobile phones or personal digital assistants (PDAs), for the purpose of video communications or access to the Internet. Most portable devices, however, have limited computation and display capabilities. Therefore, they are not suitable for high-resolution video decoding or displaying processes. In order to display high-resolution video streams using such portable devices, the high-resolution video streams should be converted into lower-resolution ones.
As various video compression standards, such as H.261, H.263, H.264, MPEG1, MPEG2, and MPEG4, have been developed, the demand for convertibility of video streams of one video compression type into video streams of another video compression type has steadily grown. In the case of loading a video source in a video stream and transmitting the video stream via client channels of different capabilities, the video stream should be converted to have a bit rate appropriate for each of the client channels. In particular, this requirement becomes important when holding a multi-spot video conference, in which a plurality of video streams should be transmitted over a limited number of channels after appropriately converting their respective bit rates.
Video transcoding technology has been developed to solve the above problem. Such technology enables a compressed video stream to be transmitted between different types of networks or between different user devices by converting the compressed video stream from one format to another. A video transcoder, like a source encoder, can change image data by adjusting several parameters, such as picture quality, frame rate, and resolution.
After such a transcoding process, it is necessary to appropriately reset timing parameters. Hereinafter, various conventional techniques of appropriately adjusting timing parameters after a transcoding process will be described.
An MPEG decoder changes its local system time based on a time stamp indicating an encoder's local system time. Data is input from an original input source to the encoder at a predetermined bit rate. The encoder outputs data having a variable bit rate. The data output from the encoder is input to an encoder buffer. Finally, stream data having a constant bit rate is output from the encoder buffer. The stream data is input to a decoder buffer via a radio frequency (RF) transmission channel at a predetermined bit rate and then input to the MPEG decoder at a variable bit rate. Accordingly, data having a predetermined bit rate is output from the decoder buffer. A timing synchronization process is performed to process data converted from a variable bit rate to a fixed bit rate or vice versa.
A conventional transcoder operates with a predetermined delay on the assumption that there is zero delay among a decoder block, a transmission port, and an end of an encoder. A decoder operates in synchronization with a time stamp of an encoder in the following manner. The encoder includes a main oscillator, which serves as a system time clock (STC), and a counter. The STC belongs to a predetermined program and is a main clock of a program for video and audio encoders.
In some MPEG standards, a time stamp may not be used for time synchronization, in which case, synchronization of different components with one another may not be guaranteed. When a video frame or audio block is input to an encoder, the encoder samples an STC from the video frame or the audio block. A constant indicating a delay between the encoder and the decoder buffer is added to the sampled STC, thereby forming a presentation time stamp (PTS). The PTS is inserted in a header of the video frame or the audio block.
In the case of reordering video frames, decode time stamps (DTSs), which indicate when each of the video frames is to be decoded by the decoder, are respectively inserted into the video frames. DTSs, which are necessary for a frame reordering process, can be the same values as their respective PTSs , except for the case of reordering B pictures. Whenever DTSs are used, PTSs are used. A DTS and a PTS are inserted into a video frame so that they are located less than or equal to 700 msec apart from each other.
According to the Advanced Television Systems Committee (ATSC), a PTS and a DTS are inserted into a header of each picture. The encoder buffer outputs transport packets each having a time stamp called program clock reference (PCR) or packetized elementary streams (PES) each having a time stamp called a system clock reference (SCR). The PCR is generated at intervals of 100 msec, and the SCR is generated at intervals of up to 700 msec. The PCR or SCR is used to synchronize an STC of the decoder with an STC of the encoder.
A program stream (PS) has an SCR as its clock reference, and a transport stream (TS) has a PCR as its clock reference. Therefore, each type of video stream or audio stream has a time stamp corresponding to an STC so as to synchronize the STC of the decoder with the STC of the encoder.
A segment browser provides a function for non-linearly reproducing a broadcasting stream using segmentation metadata, as defined by the TV-Anytime Forum. The TV-Anytime Forum is an association of private standards organizations, which seeks to enable viewers to watch various types of broadcasting programs, including conventional types of broadcasting programs and on-line interactive broadcasting services, at any time using their own storage devices. More specifically, the TV-Anytime Forum aims to develop standards regarding a service environment, in which real-time broadcasting services and Internet services are integrated.
The segmentation metadata used in the segment browser provides additional content information, such as highlights, a table of contents, or bookmarks. For example, the segmentation metadata enables only highlights of a soccer game, such as goal-scoring scenes, or only newscasts regarding a user's specific preference field to be broadcasted to the user.
Broadcasting streams are provided by a content provider and broadcasted to the user via a broadcasting system, while segmentation metadata is provided by a metadata provider and broadcasted to the user via the Internet. In other words, since the broadcasting streams and the segmentation metadata are provided by different providers, it is necessary to synchronize the timing of supply of the broadcasting streams with the timing of supply of the segmentation metadata.
Due to changes in user environments, the demand for a variety of platforms is on the increase. Recently, a personal digital recorder (PDR), in which a broadcasting stream receiver is integrated with a digital storage medium, has been developed. The PDR requires one type of broadcasting stream to be converted into another type of broadcasting stream and to be temporarily stored. Therefore, service providers are increasingly expected to provide broadcasting content that can easily be converted from one format to another at any time. A transcoder performs a function for converting such broadcasting content from one format to another. More specifically, the transcoder decodes a broadcasting stream compressed in an MPEG format and compresses the decoded stream into another format.
An MPEG broadcasting stream is stored in a storage medium as a TS or PS. The MPEG broadcasting stream includes time information, such as a PCR or SCR, which is used for synchronizing an encoder with a decoder, an STC, and a PTS and a DTS, which are used for synchronizing audio content with video content. The MPEG broadcasting stream is reconstructed using the decoder, and the time information disappears after being used to synchronize the decoder with the encoder and to synchronize the audio content with the video content.
Each segment of segmentation metadata includes a PTS or DTS so that a video stream can be synchronized with an audio stream. Therefore, if any desired segment of the segmentation metadata is searched for and selected at an end decoder, corresponding metadata is output to a user, and video or audio data corresponding the selected segment is displayed.
As disclosed in, for example, Japanese Patent Publication No. 2003-230092, in the case of resetting a PTS (or DTS) of each stream through decoding and re-encoding processes or in the case of multiplexing a plurality of TSs into a single TS, the value of the PTS (or DTS) is increased by as much as a delay occurring in a transcoder, and then each stream is re-encoded, in order to synchronize video data with audio data. However, in the case of using metadata as well as A/V data, as described above, a PTS (or DTS) of the A/V data should not be arbitrarily reset; otherwise, an end user may not use the metadata properly.
Therefore, it is necessary to synchronize streams with each other without changing a PTS or DTS of each video frame or each audio block.