This invention relates to a coded stream splicing device, a coded stream splicing method, a coded stream generating device, and a coded stream generating method which are used in a digital broadcasting system. Particularly, it relates to a coded stream splicing device, a coded stream splicing method, a coded stream generating device, and a coded stream generating method which are adapted for generating a seamless spliced stream by splicing two coded streams at a stream level.
FIG. 1 illustrates a current television broadcasting system. In the current television broadcasting system, broadcasting stations for distributing television programs to each household include a key station (or main station) SK for producing television programs on a nationwide scale, and a plurality of local stations (or branches) SA, SB and SC affiliated with the key station for producing unique local television programs. The key station SK is a broadcasting station for producing nationwide television programs and transmitting the produced television programs to the local stations. The local stations are broadcasting stations for distributing, to households within the local areas, the original television programs sent from the key station through inter-station transmission and television programs produced by editing a part of the original television programs into unique local versions. For example, as shown in FIG. 1, the local station SA is a station for producing television programs to be transmitted to households in a broadcasting area EA. The local station SB is a station for producing television programs to be transmitted to households in a broadcasting area EB. The local station SC is a station for producing television programs to be transmitted to households in a broadcasting area EC. Editing processing carried out at each local station is, for example, processing for inserting a unique local weather forecast program into a news program transmitted from the key station, or processing for inserting a local commercial into a program such as a movie or drama.
FIGS. 2A to 2C illustrate editing processing at each local station. FIG. 2A shows an original television program PGOLD produced at the key station. FIG. 2B shows a substitute television program PGNEW for local viewers produced at a local station. FIG. 2C shows a television program PGEDIT edited at a local station. The example of editing processing shown in FIGS. 2A to 2C is an example of editing processing for replacing a commercial CM1, a program 3 and a commercial CM3 of the original television programs transmitted from the key station with a commercial CM1xe2x80x2, a program 3xe2x80x2 and a commercial CM3xe2x80x2 produced at the local station for local viewers. As a result of this editing processing at the local station, television programs for local viewers are produced in which the television programs produced at the key station (that is, a program 1, a program 2, a CM2 and a program 4) and the television programs produced at the local station (that is, the commercial CM1xe2x80x2, the program 3xe2x80x2 and the CM3xe2x80x2) coexist.
The current television broadcasting system employs analog broadcasting for distributing analog base band television signals to each household. However, attempts have been recently made to replace the analog broadcasting system with a next-generation broadcasting system that uses a digital technique. The digital broadcasting system employs compression-coding of video data and audio data through a compression coding technique such as MPEG2 (Moving Picture Experts Group Phase 2), and transmits the coded streams to each household and other stations through land or satellite-based communication links. Particularly, from among broadcasting techniques proposed as the digital broadcasting system, the DVB (Digital Video Broadcasting) standard proposed as a next-generation broadcasting system in Europe is the most popular technique. This DVB standard is becoming the de facto standard.
With reference to FIG. 3, a typical digital transmission system for transmitting a program including video data and audio data from a transmitting side system to a receiving side system by using the MPEG standard will now be described.
In the typical digital transmission system, a transmission side system 10 has an MPEG video encoder 11, an MPEG audio encoder 12, and a multiplexer 13. A receiving side system 20 has a demultiplexer 21, an MPEG video decoder 22, and an MPEG audio decoder 23.
The MPEG video encoder 11 encodes base band source video data V on the basis of the MPEG standard, and outputs the coded stream as a video elementary stream ES. The MPEG audio encoder 12 encodes base band source audio data A on the basis of the MPEG standard, and outputs the coded stream as an audio elementary stream ES. The multiplexer 13 receives the video elementary stream from the MPEG video encoder 11 and the audio elementary stream from the MPEG audio encoder 12. The multiplexer 13 then converts the streams into the form of transport stream packets, thus generating transport stream packets corresponding to the video elementary stream and transport stream packets corresponding to the audio elementary stream. The multiplexer 13 multiplexes the transport stream packets so that the transport stream packet including the video elementary stream and the transport stream packet including the audio elementary stream coexist, thus generating a transport stream to be transmitted to the receiving system 20.
The demultiplexer 21 receives the transport stream transmitted through a transmission line, and demultiplexes the transport stream into the transport stream packets corresponding to the video elementary stream and the transport stream packets corresponding to the audio elementary stream. The demultiplexer 21 then generates the video elementary stream from the transport stream packets corresponding to the video elementary stream, and generates the audio elementary stream from the transport stream packets corresponding to the audio elementary stream. The MPEG video decoder 22 receives the video elementary stream from the demultiplexer 21, and decodes this video elementary stream on the basis of the MPEG standard, thus generating the base band video data V. The MPEG audio decoder 23 receives the audio elementary stream from the demultiplexer 21, and decodes this audio elementary stream on the basis of the MPEG standard, thus generating the base band audio data A.
In the case where the conventional analog broadcasting system is to be replaced with the digital broadcasting system using such a technique of digital transmission system, video data of television programs are transmitted to the local station from the key station in the form of a coded stream which is compression-coded on the basis of the MPEG2 standard. Therefore, to carry out editing processing at the local station for replacing a part of the original coded stream transmitted from the key station with a coded stream produced at the local station, the coded stream must be decoded once to restore the base band video data before the editing processing is carried out. That is, since the direction of prediction of each picture included in a coded stream in conformity to the MPEG standard is correlated with the direction of prediction of the preceding and subsequent pictures, an unrelated coded stream cannot be inserted at an arbitrary position on the stream. If such insertion is attempted, data at the seam of the coded streams become discontinuous and cannot be decoded accurately.
Therefore, to realize editing processing as described with reference to FIGS. 2A to 2C, it is necessary to carry out decoding processing for once decoding both an original coded stream supplied from the key station and a coded stream produced for local viewers so as to restore base band video signals, editing processing for editing the two base band video data to generate video data edited for broadcasting, and coding processing for again coding the edited video data to generate an edited coded video stream. However, since coding/decoding processing based on the MPEG standard is not 100% reversible, there is a problem that the picture quality is deteriorated as decoding processing and coding processing are repeated.
Thus, it is desirable to have a technique which enables editing in the state of coded streams without decoding supplied coded streams. The technique of connecting two different coded bit streams at the level of coded bit streams so as to generate connected bit streams is referred to as xe2x80x9csplicingxe2x80x9d. In short, splicing means editing and connection of plural streams in the state of coded streams.
However, realization of this splicing processing has the following two problems.
The first problem will now be described.
In accordance with the MPEG standard used for the above-described MPEG video encoder 11 and MPEG video decoder 22, a bidirectionally predictive coding system is employed as the coding system. In this bidirectionally predictive coding system, three types of coding, that is, intra-frame coding, inter-frame forward predictive coding, and bidirectionally predictive coding, are carried out. Pictures obtained by the respective types of coding are referred to as I-picture (intra coded picture), P-picture (predictive coded picture), and B-picture (bidirectionally predictive coded picture). I-, P- and B-pictures are appropriately combined to form a GOP (Group of Pictures) as a unit for random access. In general, an I-picture has the largest quantity of generated bits, and P-picture has the second largest quantity of generated bits. B-picture has the smallest quantity of generated bits.
In the coding method in which the quantity of generated bits varies for each picture as in the MPEG standard, in order to accurately decode obtained coded bit streams (hereinafter referred to simply as streams) by the video decoder so as to obtain a picture, the data occupancy quantity in an input buffer of the video decoder 22 must be known by the video encoder 11. Thus, in accordance with the MPEG standard, a virtual buffer referred to as xe2x80x9cVBV (Video Buffering Verifier) bufferxe2x80x9d is provided as a buffer corresponding to the input buffer of the video decoder 22, and it is prescribed that the video encoder 11 carries out coding processing so as not to cause breakdown of the VBV buffer, that is, underflow or overflow of the buffer. For example, the capacity of the VBV buffer is determined in accordance with the standard of signals to be transmitted. In the case of standard video signals of main profile and main level (MP@ML), the VBV buffer has a capacity of 1.75 Mbits. The video encoder 11 controls the quantity of generated bits of each picture so as not to cause overflow or underflow of the VBV buffer.
The VBV buffer will now be described with reference to FIGS. 4A to 4C.
FIG. 4A shows an original stream STOLD obtained by a video encoder by coding original television programs including a program 1 and a commercial CM1 produced at the key station, and the locus of the data occupancy quantity in the VBV buffer corresponding to the original stream STOLD. FIG. 4B shows a substitute stream STNEW obtained by a video encoder of a local station by coding a commercial CM1xe2x80x2 produced for local viewers for replacing the part of the commercial CM1 of the original television programs, and the locus of the data occupancy quantity in the VBV buffer corresponding to the substitute stream STNEW. In the following description, since a part of the stream obtained by coding the original television programs transmitted from the key station to the local station is replaced by a new stream, the original stream obtained by coding the original television programs is expressed as xe2x80x9cSTOLDxe2x80x9d, which indicates an old stream, and the new stream which is substituted for a part of the original stream STOLD is expressed as xe2x80x9cSTNEWxe2x80x9d. FIG. 4C shows a spliced stream STSPL obtained by splicing the substitute stream STNEW into the original stream STOLD at a splicing point SP, and the locus of the data occupancy quantity in the VBV buffer corresponding to the spliced stream STSPL.
In FIGS. 4A to 4C, right upward portions (sloped portions) of the locus of the data occupancy quantity in the VBV buffer express the transmission bit rate, and vertically falling portions express the quantity of bits read out from the decoder buffer by the video decoder for reproducing each picture. The timing at which the video decoder reads out bits from the decoder buffer is designated in accordance with information referred to as decoding time stamp (DTS). In FIGS. 4A to 4C, I, P and B represent I-picture, P-picture and B-picture, respectively.
The original coded stream STOLD is a stream coded by the video encoder of the key station, and the substitute stream STNEW is a stream coded by the video encoder of the local station. The original coded stream STOLD and the substitute stream STNEW are individually coded by their respective video encoders. Therefore, since the video encoder of the local station carries out coding processing for uniquely generating the substitute stream STNEW without knowing the locus of the data occupancy quantity in the VBV buffer of the video encoder of the key station at all, the data occupancy quantity VBVOLD of the original stream STOLD in the VBV buffer at the splicing point and the data occupancy quantity VBVNEW of the substitute stream STNEW in the VBV buffer at the splicing point are different from each other.
In order to prevent discontinuity of the locus of the data occupancy quantity in the VBV buffer around the splicing point SP of the spliced stream STSPL, the initial level of the data occupancy quantity of the substitute stream STNEW of the spliced stream STSPL in the VBV buffer must be that of the data occupancy quantity VBVOLD in the VBV buffer. As a result, as shown in FIGS. 4A to 4C, if the value of the data occupancy quantity VBVNEW of the substitute stream STNEW in the VBV buffer is smaller than the value of the data occupancy quantity VBVOLD of the original stream STOLD in the VBV buffer, the VBV buffer will overflow at the part of the substitute stream STNEW of the spliced stream STSPL. Further, if the value of the data occupancy quantity VBVNEW of the substitute stream STNEW in the VBV buffer is greater than the value of the data occupancy quantity VBVOLD of the original stream STOLD in the VBV buffer, the VBV buffer will underflow at the part of the substitute stream STNEW of the spliced stream STSPL.
The second problem associated with splicing will now be described.
In a header of a stream coded on the basis of the MPEG standard, various data elements and flags indicative of the coded information are described. The coded stream is decoded by using these data elements and flags.
The programs 1, 2, 3 and 4 constituting the main portion of the original television programs shown in FIGS. 2A to 2C are not necessarily made up of television signals of the NTSC format, having a frame rate of 29.97 Hz (approximately 30 Hz) recorded by a video camera or the like, and may be made up of television signals converted from movie material having a frame rate of 24 Hz (24 frames per second). In general, processing for converting movie material of 24 Hz to television signals of 29.97 Hz is referred to as xe2x80x9c2:3 pull-down processingxe2x80x9d as it includes processing for converting two fields of the original material to three fields in a predetermined sequence.
FIG. 5 illustrates this 2:3 pull-down processing. In FIG. 5, T1 to T8 indicate top fields of a movie material having a frame frequency of 24 Hz, and B1 to B8 indicate bottom fields of the movie material having a frame frequency of 24 Hz. Ellipses and triangles shown in FIG. 5 indicate the structures of frames constituted by top fields and bottom fields.
In this example of 2:3 pull-down processing, four repeat fields are inserted into the movie material (eight top fields T1 to T8 and eight bottom fields B1 to B8) having a frame frequency of 24 Hz. The four repeat fields include a repeat field B2xe2x80x2 generated by repeating the bottom field B2, a repeat field T4xe2x80x2 generated by repeating the top field T4, a repeat field B6xe2x80x2 generated by repeating the bottom field B6, and a repeat field T8xe2x80x2 generated by repeating the top field T8. As a result, television signals having a frame frequency of 29.97 Hz are generated from the movie material having a frame frequency of 24 Hz.
In the MPEG encoder, the television signals obtained by 2:3 pull-down processing are not directly coded by the video encoder, but are coded after the repeat fields are removed from the 2:3 pull-down processed television signals. In the example shown in FIG. 5, the repeat fields B2xe2x80x2, T4xe2x80x2, B6xe2x80x2 and T8xe2x80x2 are removed from the 2:3 pull-down processed television signals. The reason for removing the repeat fields before coding processing is that the repeat fields are redundant fields inserted at the time of 2:3 pull-down processing and do not cause any deterioration in picture quality even when they are deleted for the purpose of improving the compression coding efficiency.
Also, in accordance with the MPEG standard, it is prescribed that a flag xe2x80x9crepeat_first_fieldxe2x80x9d, indicating whether or not a repeat field should be generated by repeating any of two fields constituting a frame, be included in a coded stream. That is, in decoding a coded stream, if the flag xe2x80x9crepeat_first_fieldxe2x80x9d in the coded stream is xe2x80x9c1xe2x80x9d, the MPEG decoder generates a repeat field. If the flag xe2x80x9crepeat_first_fieldxe2x80x9d in the coded stream is xe2x80x9c0xe2x80x9d, the MPEG decoder does not generate a repeat field.
In the example shown in FIG. 5, xe2x80x9crepeat_first_fieldxe2x80x9d of a stream obtained by coding the frame constituted by the top field T1 and the bottom field B1 is xe2x80x9c0xe2x80x9d, and xe2x80x9crepeat_first_fieldxe2x80x9d of a stream obtained by coding the frame constituted by the top field T2 and the bottom field B2 is xe2x80x9c1xe2x80x9d. The flag xe2x80x9crepeat_first_fieldxe2x80x9d of a stream obtained by coding the frame constituted by the top field T3 and the bottom field B3 is xe2x80x9c0xe2x80x9d, and xe2x80x9crepeat_first_fieldxe2x80x9d of a stream obtained by coding the frame constituted by the top field T4 and the bottom field B4 is xe2x80x9c1xe2x80x9d. Therefore, in decoding the coded stream of the frame constituted by the top field T2 and the bottom field B2, the repeat field B2xe2x80x2 is generated. In decoding the coded stream of the frame constituted by the top frame T4 and the bottom frame B4, the repeat field B4xe2x80x2 is generated.
In addition, in accordance with the MPEG standard, it is prescribed that a flag xe2x80x9ctop_field_firstxe2x80x9d, indicating whether the first field of two fields constituting a frame is a top field or a bottom field, is described in a coded stream. Specifically, if xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c1xe2x80x9d, it indicates a frame structure in which the top field is temporally preceding the bottom field. If xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c0xe2x80x9d, it indicates a frame structure in which the top field is temporally subsequent to the bottom field.
In the example of FIG. 5, xe2x80x9ctop_field_firstxe2x80x9d of the coded stream of the frame constituted by the top field T1 and the bottom field B1 is xe2x80x9c0xe2x80x9d, and xe2x80x9ctop_field_firstxe2x80x9d of the coded stream of the frame constituted by the top field T2 and the bottom field B2 is xe2x80x9c0xe2x80x9d. The flag xe2x80x9ctop_field_firstxe2x80x9d of the coded stream of the frame constituted by the top field T3 and the bottom field B3 is xe2x80x9c1xe2x80x9d, and xe2x80x9ctop_field_firstxe2x80x9d of the coded stream of the frame constituted by the top field T4 and the bottom field B4 is xe2x80x9c1xe2x80x9d.
With reference to FIGS. 6A to 6C, a problem generated with respect to the flags such as xe2x80x9ctop_field_firstxe2x80x9d and xe2x80x9crepeat_first_fieldxe2x80x9d defined in accordance with the MPEG standard when the coded stream is spliced will now be described.
FIG. 6A shows the frame structure of the original stream STOLD obtained by coding the original television programs produced at the key station. FIG. 6B shows the frame structure of the substitute stream STNEW obtained by coding the commercial CM1xe2x80x2 for local viewers produced at the local station. FIG. 6C shows the frame structure of the spliced stream STSPL obtained by splicing processing.
The program 1 and the program 2 in the original stream STOLD are coded streams obtained by 2:3 pull-down processing, and each frame of the commercial CM1 of the main portion is a coded stream having the frame structure in which xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c0xe2x80x9d. The local commercial CM1xe2x80x2 shown in FIG. 6B is a coded stream to replace the commercial CM1 in the original television programs, and has the frame structure in which xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c1xe2x80x9d. The spliced stream STSPL shown in FIG. 6C is a stream generated by splicing the substitute stream STNEW subsequent to the original stream STOLD indicated by the program 1 and then splicing the original stream STOLD indicated by the program 2 subsequent to the substitute stream STNEW. In short, the spliced stream STSPL is a stream obtained by inserting the local commercial CM1xe2x80x2 in place of the main commercial CMl of the original stream STOLD.
The commercial CM1 produced at the key station shown in FIG. 6 is a coded stream with each frame having the frame structure in which xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c0xe2x80x9d. The commercial CM1xe2x80x2 produced at the local station is a coded stream having the frame structure in which xe2x80x9ctop_field_firstxe2x80x9d is xe2x80x9c1xe2x80x9d.
In the case where the frame structure of the commercial CM1 and the frame structure of the substitute commercial CM1xe2x80x2 to replace the commercial CM1 are different from each other as shown in FIGS. 6A and 6B, if the stream of the commercial CM1xe2x80x2 is spliced subsequently to the stream of the program 1 at a splicing point SP1 in the original stream STOLD, a field gap is generated in the spliced stream STSPL. The field gap means dropout of the bottom field B6 at the splicing point SP1 from the spliced stream STSPL, which causes discontinuity in the top field/bottom field pattern as shown in FIG. 6C.
The coded stream having a field gap is a coded stream which does not conform to the MPEG standard and cannot be normally decoded by an ordinary MPEG decoder.
On the other hand, if the stream of the program 2 is spliced subsequently to the commercial CM1xe2x80x2 at a splicing point SP2 in the original stream STOLD, field duplication is generated in the spliced stream STSPL. This field duplication means the existence of a bottom field b12 and a bottom field B12 at the splicing point SP2 in the same display time, as shown in FIG. 6C.
The coded stream in which the field duplication exists is a coded stream which does not conform to the MPEG standard and cannot be normally decoded by an ordinary MPEG decoder.
In short, if splicing processing is carried out without regard to the field or frame pattern, the field pattern or the frame pattern becomes discontinuous and a spliced stream conforming to the MPEG standard cannot be generated.
It is an object of the present invention to provide a coded stream splicing device for realizing seamless splicing processing which generates a continuous locus of the data occupancy quantity of a spliced stream in the VBV buffer and prevents breakdown of the VBV buffer.
It is another object of the present invention to provide a coded stream splicing device for realizing seamless splicing processing which prevents discontinuity in the stream structure of a coded stream around a splicing point.
In order to realize the above objectives, a system for seamlessly splicing two encoded video streams is provided. In one implementation of the system, one or more coding parameters are extracted from the first and/or second encoded streams and one or more parameters of the first and/or second encoded streams are changed in accordance with the extracted parameter(s) in order to effectuate seamless splicing. In another implementation, the coding parameters applied to the first encoded stream are referenced when encoding the second stream such that the resulting second encoded stream can be seamlessly spliced with the first encoded stream.