In recent years, distribution of audio-visual data in narrow bands has become possible, as the international standard MPEG-4 (Moving Picture Experts Group Phase-4, ISO/IEC14496) relating to a compressive coding method for audio-visual data has been established. For example, a transmission path having a band width of 64 kbit/sec can transmit video data having 176 pixels in the horizontal direction and 144 pixels in the vertical direction within one frame and a frame rate of 5˜6 frames/sec, and audio data having a quality as high as telephone quality, simultaneously.
In a simple profile that is defined by the above-mentioned MPEG-4 video standard, as VOPs (Video Object Planes) corresponding to images of individual objects constituting one scene, I-VOPs and P-VOPs having different coding types are employed. To be specific, an I-VOP is a VOP whose video data is compressed or decompressed without referring to video data of other VOPs. Accordingly, coding or decoding of an I-VOP can be carried out independently of video data of other VOPs. On the other hand, a P-VOP is processed as follows. When performing compression or decompression of video data of a target P-VOP to be processed, predictive data is obtained by prediction on the basis of video data of an I-VOP or P-VOP that is positioned just before the target P-VOP, and a difference component between the predictive data and the video data of the target P-VOP is obtained, and the difference component so obtained is coded or decoded.
In digital satellite broadcasting using a broad band, the repetition cycle of I-VOP is usually about 0.5 sec, i.e., an I-VOP appears about every 0.5 sec. That is, in Japanese TV broadcasting, since the number of frames per sec is about 30, an I-VOP appears in every 15 frames. On the other hand, in a narrow band, the effect of improving the video quality is higher when the repetition cycle of I-VOP having a relatively large amount of codes of coded video data (coded data) is lengthened to increase the frequency of occurrence of P-VOP and B-VOP (i.e., VOPs which are coded or decoded with referring to video data of other VOPs) having a less amount of codes of coded data, than when the frequency of occurrence of I-VOP is increased. However, lengthening the repetition cycle of I-VOP, i.e., reducing the frequency of occurrence of I-VOP, is not desirable from the viewpoint of anti-error property, and it may cause image disordering to continue for a long time when a packet loss occurs. The above-mentioned VOPs in the MPEG-4 correspond to frames in MPEG-1 and MPEG-2.
Furthermore, the international standards organization 3GPP (Third Generation Partnership Project, http://www.3gpp.org) which defines the standard of receiving terminals in radio networks, provides that RTP/UDP/IP (Real-time Transport Protocol/User Datagram Protocol/Internet Protocol) is employed as a protocol for transmitting video data between a server and a receiving terminal, and RTSP/TCP/IP (Real-time Streaming Protocol/Transmission Control Protocol/Internet Protocol) is employed as a protocol for requesting data from a receiving terminal to a server. Furthermore, in the 3GPP standard, SMIL (Synchronization Multimedia Markup Language, http://www.w3.org) is available as a scene description language.
FIG. 18 shows a conventional data transmission system 20 for distributing video data using the Internet.
The data transmission system 20 comprises a server 20a for packetizing a video stream as the above-mentioned coded data, and transmitting packet data; a receiving terminal 20b for receiving the video stream, and reproducing video data; and a network 11, such as the Internet, for transmitting the packet data from the server 20a to the receiving terminal 20b. 
In this data transmission system 20, initially, exchange of a message Mes for requesting data from the server 20a is carried out by the RTSP/TCP/IP between the receiving terminal 20b and the server 20a, whereby a data request signal Dau is transmitted from the receiving terminal 20b to the server 20a. Then, a video stream Dstr is transmitted from the server 20a to the receiving terminal 20b by the RTP/UDP/IP as a data transmission protocol. In the receiving terminal 20b, decoding of the received video stream Dstr is carried out, whereby video data is reproduced.
FIGS. 19(a) and 19(b) are diagrams for explaining a conventional video coding apparatus 100 which performs a coding process adapted to the MPEG standard, and FIG. 19(a) is a block diagram illustrating the construction of the apparatus 100.
The video coding apparatus 100 constitutes the server 20a shown in FIG. 18. The video coding apparatus 100 includes an encoder 102 which compressively codes original video data Dv as it is when coding an I-VOP while compressively coding difference data Dvd between the original video data Dv and its predictive data Dp when coding a P-VOP, and outputs coded data De; a decoder 103 which decompresses compressed data Dc and compressed difference data Dcd which have been obtained by compressing the original video data Dv and the difference data Dvd in the encoder 102, and outputs locally-decoded data Dd corresponding to the I-VOP and locally-decoded difference data Ddd corresponding to the P-VOP; and a subtracter 101 which performs subtraction between the original video data Dv and the predictive data Dp to generate the difference data Dvd.
The video coding apparatus 100 further includes an adder 104 which adds the predictive data Dp to the locally-decoded difference data Ddd to generate locally-decoded data Ddp corresponding to the P-VOP; and a frame memory 105 in which the locally-decoded data Dd corresponding to the I-VOP and the locally-decoded data Ddp corresponding to the P-VOP are stored as reference data. The video data read from the frame memory 105 is supplied to the subtracter 101 and the adder 104 as the predictive data Dp.
Next, the operation of the conventional video coding apparatus 100 will be described.
In the video coding apparatus 100, as shown in FIG. 19(b), an original video data Dv supplied from the outside is coded for every VOP.
For example, first VOP data V(1) is coded as an I-VOP, second to fifth VOP data V(2)˜V(5) are coded as P-VOPs, sixth VOP data V(6) is coded as an I-VOP, and seventh to tenth VOP data V(7)˜V(10) are coded as P-VOPs.
When coding is started, initially, the first VOP data V(1) is coded as an I-VOP. More specifically, the original video data Dv corresponding to an I-VOP is compressively coded by the encoder 102, and outputted as coded data De. At this time, compressed data Dc obtained by compressing the original video data Dv is outputted from the encoder 102 to the decoder 103. In the decoder 103, decompression of the compressed data Dc is carried out, whereby locally-decoded data Dd corresponding to the I-VOP is generated. The locally-decoded data Dd outputted from the decoder 103 is stored in the frame memory 105 as reference data.
Next, the second VOP data V(2) is coded as a P-VOP. More specifically, the original video data Dv corresponding to a P-VOP is inputted to the subtracter 101 which is placed before the encoder 102. In the subtracter 101, difference data Dvd between the original video data Dv corresponding to the P-VOP and video data which is read from the frame memory 105 as predictive data Dp is generated. Then, the difference data Dvd is compressively coded by the encoder 102, and outputted as coded data De.
Further, at this time, compressed difference data Dcd which is obtained by compressing the difference data Dvd is outputted from the encoder 102 to the decoder 103. In the decoder 103, decompression of the compressed difference data Dcd is carried out, whereby locally-decoded difference data Ddd is generated. In the adder 104, the locally-decoded difference data Ddd outputted from the decoder 103 is added to the predictive data Dp read from the frame memory 105, whereby locally-decoded data Ddp corresponding to the P-VOP is generated. The locally-decoded data Ddp outputted from the adder 104 is stored in the frame memory 105 as reference data.
Thereafter, the third to fifth VOP data V(3)˜V(5) are coded as P-VOPs like the second VOP data. Further, the sixth VOP data V(6) is coded as an I-VOP like the first VOP data V(1), and the following seventh to tenth VOP data V(7)˜V(10) are coded as P-VOPs like the second VOP data V(2).
As described above, in the video coding apparatus 100, coding of the original video data Dv is carried out with the I-VOP cycle being 5 VOPs.
FIG. 20 is a block diagram for explaining a conventional video decoding apparatus 200.
The video decoding apparatus 200 decodes the coded data De outputted from the video coding apparatus 100 shown in FIG. 19(a), and it constitutes a decoding section of the receiving terminal 20b in the data transmission system 20.
More specifically, the video decoding apparatus 200 includes a decoder 201 which performs decompressive decoding in VOP units on the coded data De outputted from the video coding apparatus 100, and outputs decoded data Dd corresponding to the original video data Dv when decoding an I-VOP while outputting decoded difference data Ddd corresponding to the difference data Dvd between the original video data Dv and its predictive data Dp when decoding a P-VOP; an adder 202 which adds the predictive data Dp to the decoded difference data Ddd to generate decoded data Ddecp corresponding to the P-VOP; and a frame memory 203 in which the decoded data Dd corresponding to the I-VOP and the decoded data Ddecp corresponding to the P-VOP are stored as reference data. The video data which is read from the frame memory 203 as the predictive data Dp is supplied to the adder 202.
Next, the operation of the video decoding apparatus 200 will be briefly described.
When decoding is started, in the video decoding apparatus 200, the coded data De supplied from the video coding apparatus 100 is decoded for every VOP.
More specifically, when the coded data De corresponding to the I-VOP is inputted to the decoder 201, decompressive decoding of the coded data De is carried out in the decoder 201, whereby decoded data Dd corresponding to the original video data Dv is generated. The decoded data Dd is outputted from the video decoding apparatus 200 and, simultaneously, stored in the frame memory 203 as reference data.
On the other hand, when the coded data De corresponding to the P-VOP is inputted to the decoder 201, decompressive decoding of the coded data De is carried out in the decoder 201, whereby decoded difference data Ddd corresponding to the difference data Dvd between the original video data Dv and the predictive data Dp is generated. When the decoded difference data Ddd is inputted to the adder 202, the decoded difference data Ddd is added to the video data which is read from the frame memory 203 as the predictive data Dp, whereby decoded data Ddecp corresponding to the P-VOP is generated. The decoded data Ddecp is outputted from the video decoding apparatus 200 and, simultaneously, stored in the frame memory 203 as reference data.
However, the conventional data transmission system 20 shown in FIG. 18 has the following drawbacks.
In particular, there are cases where the data outputted from the distribution server does not reach the receiving terminal, depending on the characteristics of the protocols. One of the causes of this accident is as follows. When a bit error occurs in a received packet, the received packet is discarded by an error detecting mechanism in the UDP. Especially in a transmission system which includes a radio transmission line in a transmission path from a server to a receiving terminal, when the radio wave intensity at the receiving terminal is weak, transmitted data received by the terminal cannot be normally demodulated, resulting in a bit error in the received data.
Further, at the receiving terminal, unless data (video stream) equivalent to one frame (VOP) is prepared (stored), decoding of the video frame cannot be carried out. Therefore, as a countermeasure against the occurrence of a transmission error, the following method is employed. When a transmission error occurs, data of a frame (VOP) which has not been normally received is discarded, and a video frame whose data has already been received normally is displayed until data of an I frame (I-VOP) is normally received after the occurrence of the transmission error. When data of an I frame has been received normally, decoding is resumed from this I frame. Although this method causes no image disordering, the motion of the display image is stopped until the reception of the I frame.
Furthermore, another method as a countermeasure against the occurrence of a transmission error is as follows. As a substitute for data of a frame (VOP) which has not been normally received, data of a just previous frame which has been normally received and decoded is used, and the data of this frame is used for decoding of subsequent frames. In this method, the motion of the display image is not stopped in the frames other than the frame whose data has not been normally received, whereby smooth display is performed. However, since data of a target frame (target of decoding) is decoded with reference to a frame different from the frame that was referred to in the coding process, the display contents might be greatly disordered. Although it depends on the viewer's preference, when a transmission error occurs, a reproduced (display) image in which the viewer feels less incongruity can be obtained by using the method of displaying the frame just before the transmission error until data of an I frame is normally received after the occurrence of the transmission error, than the method of replacing the data of the discarded reference frame corresponding to the target frame with data of a frame other than the reference frame.
However, the conventional receiving terminal has previously been set so as to execute either of the above-mentioned two methods as a countermeasure against the occurrence of a transmission error and, therefore, the viewer sometimes feels considerable incongruity in the image displayed when a transmission error occurs.
Furthermore, in order to suppress degradation in video quality due to data compression, the frequency of occurrence of I frame (I-VOP) should be reduced as low as possible. However, from the viewpoint that the decoding process which has been in the abnormal state due to the occurrence of the transmission error should be quickly resumed to the normal decoding state, the frequency of occurrence of I frame (I-VOP) cannot be significantly reduced.