The present invention relates to an encoding system for encoding input video data and a decoding system for decoding encoded streams.
Recently, in order to compress/encode video data, the MPEG (Moving Picture Experts Group) technology standardized as ISO/IEC 13818 has come into common use at broadcasting stations that produce and broadcast television programs. MPEG is becoming the de facto standard especially for recording video data generated by video cameras or the like, on tape, disks, or other recording media that can be accessed randomly or for transmitting video programs produced at broadcasting stations, via cables or satellites.
The MPEG technology is an encoding technology that can improve compression efficiency by means of predictive coding of pictures. More particularly, the MPEG standard employs a plurality of predictive coding systems that combine intra-frame prediction and inter-frame prediction, and each picture is encoded by means of either of the following picture types: I-picture (Intra Picture), P-picture (Predictive Picture), and B-picture (Bidirectionally Predictive Picture) according to the prediction system. The I-picture, which is not predicted from other pictures, is a picture encoded within the frame. The P-picture is a picture subjected to inter-frame forward predictive coding by a preceding (past) I-picture or P-picture. The B-picture is a picture subjected to bidirectionally predictive coding both by a preceding (past) I-picture or P-picture and by a following (future) I-picture or P-picture.
FIG. 1 shows an example of a video processing system used within a broadcasting station or between broadcasting stations. As described above, it has been proposed to use a MPEG encoder and MPEG decoder as shown in FIG. 1 to transmit source video data from a first video processor 1 provided as a sending system to a second video processor 4 provided as a receiving system within or between broadcasting stations.
The first video processor 1 receives source video data such as component-type base-band video of the D1 format and performs edit processing, special-effects processing, synthesis processing or the like on the source video data. The video processor 1 also receives ancillary data such as closed captions and teletext data and adds the ancillary data to the blanking intervals of the source video data. Therefore, ancillary data has been embedded in the blanking intervals of the video data output from the video processor 1.
The MPEG encoder 2 receives the video data from the video processor 1 and encodes it to generate an encoded stream. Such an encoded stream is also known as an elementary stream. As everyone knows, television signals have vertical and horizontal blanking intervals on four sides of the actual video data area called the active video area and the ancillary data described above has been inserted in the blanking intervals.
However, the MPEG standard specifies that only the active video area where pixels actually exist must be encoded. Thus, the encoded stream is the data obtained by encoding only the active video area of input video data, but it does not contain ancillary data superimposed on the blanking intervals. In other words, when the input video data is encoded by the MPEG encoder 2, the ancillary data superimposed on the input video data is lost.
The MPEG decoder 3 receives the encoded stream from the MPEG encoder and decodes it to generate decoded video data, which is then supplied to the second video processor 4. Since the encoded stream supplied to the MPEG decoder 3 does not contain information on the ancillary data, naturally the decoded video data does not contain information on the ancillary data.
Thus, MPEG encoding and MPEG decoding performed when video data is transmitted from the sending system to the receiving system have the problem that the ancillary data added to the blanking intervals of the video data by the first video processor 1 is not transmitted from the sender or first video processor 1 to the receiver or second video processor 4 although the video data that corresponds to the active area can be transmitted.
Moreover, the fact that only the active video data is transmitted if MPEG encoding and MPEG decoding are performed when video data is transmitted from the sending system to the receiving system means that the information inherent to the source video data is not transmitted to the receiving system either. The information inherent to the source video data is the information possessed by the source video data itself, including the location of the blank areas or the location of the active video area with respect to the full pixel area. Specifically, this is the information as to from which vertical line in the full pixel area of the source video data the active video lines start and as to at which pixel in the horizontal direction of the full pixel area the active video area starts.
Now the processing of video data that has undergone 3:2 pull-down will be described with reference to FIG. 2. The figure shows an example of a video processing system used within or between broadcasting stations to process both video data with a 24-Hz frame frequency and video data with a 30-Hz frame frequency.
The 3:2 pull-down circuit 5 receives video data that has a frame rate of 24 Hz (24 frames per second) and generates video data with a frame rate of 30 Hz (30 frames per second). The film material used at movie theaters and the like is recorded on optical films at a frame rate of 24 Hz (24 frames per second), which is entirely different from the 29.97-Hz frame rate of NTSC television signals. Thus, to convert film material to television signals, 30 frames are generated from 24 frames.
The 3:2 pull-down process will be described with reference to FIGS. 3A and 3B. FIG. 3A shows source video data with a frame rate of 24 Hz while FIG. 3B shows the 30-Hz video data after a 3:2 pull-down conversion. As shown in FIGS. 3A and 3B, the 3:2 pull-down process generates a repeat field t1xe2x80x2 by repeating the top field t1 in the first frame F1 and generates a repeat field b3xe2x80x2 by repeating the bottom field b3 in the third frame F3. Thus, the 3:2 pull-down process converts video data with a frame rate of 24 Hz into video data with a frame rate of 30 Hz by converting 2 fields into 3 fields in a predetermined sequence.
As described with reference to FIG. 1, the first video processor 1 receives 30-Hz source video data and performs edit, special-effects, synthesis, and/or other operations on it. The video processor 1 also receives ancillary data such as closed captions and teletext data and adds it to the blanking intervals of the source video data. This addition of ancillary data is performed with respect to video data having a frame frequency of 30 Hz and the ancillary data is added to all the fields contained in the 30-Hz video data. That is, the ancillary data is added not only to the top fields (t1, t2, . . . ) and bottom fields (b1, b2, . . . ), but also to the repeat fields t1xe2x80x2 and b3xe2x80x2.
The 2:3 pull-down circuit 6 receives the 30-Hz video data generated by the 3:2 pull-down process described above and converts it into video data with a frame rate of 24 Hz. Specifically, as shown in FIG. 3C, the 2:3 pull-down circuit 6 is used to remove the repeat fields t1xe2x80x2 and b3xe2x80x2 inserted by the 3:2 pull-down process. The 2:3 pull-down process must be performed before MPEG encoding. This is because these repeat fields are the redundant fields inserted by the 3:2 pull-down process and can be deleted without any degradation in image quality,
The MPEG encoder 2, which is the same as the MPEG encoder 2 described with reference to FIG. 1, receives 24-Hz video data from the 2:3 pull-down circuit 6 and encodes it to generate an encoded stream.
However, the MPEG standard specifies that only the active video area where pixels actually exist must be encoded. Thus, the encoded stream is the data obtained by encoding only the active video area of input video data, but it does not contain ancillary data superimposed on the blanking intervals. In other words, when the input video data is encoded by the MPEG encoder 2, the ancillary data superimposed on the input video data is lost.
The MPEG decoder 3, which is the same as the MPEG decoder 3 described with reference to FIG. 1, receives the encoded stream from the MPEG encoder and decodes it to generate decoded video data. According to the MPEG standard, the Repeat_first_field and Top_field_first flags are set in encoded streams as information about the frame structure. Since MPEG decoders process data based on these flags, the decoded video data has a frame rate of 30 Hz.
As can be seen from the above description, even if ancillary data is added to 30-Hz video data by the processor in the sending system, the repeat fields are removed from the 30-Hz video data after the 2:3 pull-down process necessary for MPEG encoding. That is, the ancillary data is removed together with the repeat fields to which it has been added. Therefore, if a 2:3 pull-down process is performed when video data is transmitted from the sending system to the receiving system, the information-about the ancillary data added to the repeat fields is not transmitted from the sender or first video processor 1 to the receiver or second video processor 4 because the repeat fields themselves are removed by the 2:3 pull-down process.
The present invention relates to an encoding system for encoding input video data and a decoding system for decoding the encoded streams. More particularly, it proposes a system and method for sending the ancillary data added to video data and the information inherent to the video data together with encoded streams in such a way that they will not be lost even if MPEG encoding and MPEG decoding are repeated.
The MPEG encoder extracts the ancillary data from the video data, inserts the extracted ancillary data into encoded streams as Ancillary_data, and thereby sends the ancillary data together with the encoded streams. The MPEG decoder extracts the ancillary data from the encoded streams and adds it to the base-band video data generated by MPEG decoding.
An encoding apparatus for encoding input video data extracts ancillary data that are added in the blanking intervals of the input video data from the input video data, encodes the input video data to generate encoded streams, and controls the above described encoding means so as to insert the above described ancillary data into the encoded streams.
A decoding apparatus for decoding the encoded streams generated by encoding input video data extracts ancillary data from the encoded streams, decodes the encoded streams to generate decoded video data, and multiplexes the ancillary data onto the blanking intervals of the decoded video data.
A decoding apparatus for decoding the encoded streams generated by encoding input video data parses the syntax of the encoded streams to obtain the ancillary data contained in the encoded streams, decodes the encoded streams to generate decoded video data, and multiplexes the ancillary data onto the decoded video data so that the input video data and decoded video data will have the same ancillary data.
A decoding apparatus for decoding the encoded streams generated by encoding input video data obtains the ancillary data contained in the picture area of the encoded streams, decodes the encoded streams to generate decoded video data, and multiplexes the decoded video data and ancillary data to generate the same data as the input video data.
A coding system comprises encoding means for encoding input video data and decoding means for receiving and decoding the encoded streams encoded by the encoding means to generate decoded video data; wherein the encoding means comprises means for encoding the above described input video data to generate encoded streams and means for inserting the ancillary data contained in the input video data into the encoded streams, and the decoding means comprises means for decoding the encoded streams to generate decoded video data and means for multiplexing the ancillary data transmitted with the encoded streams onto the decoded video data.