1. Field of the Invention
The present invention relates to streaming methods and, more specifically, to a streaming method wherein a server transmits multimedia data over the Internet to a terminal, and the terminal plays back the multimedia data while receiving the multimedia data from the server.
2. Description of the Background Art
Description of Encoding and Compressing Scheme for Multimedia Data, and Buffer Model
Multimedia data that is transmitted over the Internet varies in type such as moving pictures, still pictures, audio, text, and data having these types of data multiplexed thereon. To encode and compress the moving pictures, H. 263, MPEG-1, MPEG-2, and MPEG-4 are well known. For the still pictures, JPEG is well known, and for the audio, MPEG audio, G. 729, etc. are well known; the list is thus endless.
In the present invention, the main concern is streaming playback. Thus, moving pictures and audio are mainly transmitted. Described herein are an MPEG video which is popularly applied to compress the moving pictures, especially an MPEG-1(ISO/IEC 11172) video, and an MPEG-2 (ISO/IEC 13818) video which is relatively simple in process.
The MPEG video has the following two main characteristics to realize data compression with high efficiency. The first characteristic is a compression scheme utilizing intra-frame temporal correlation which is applied together with a conventional compression scheme utilizing spatial frequency to compress the moving picture data. In data compression by MPEG, frames (pictures) structuring one stream are classified into three types of frames called I, P, and B frames. In more detail, the I frame is an Intra-Picture, the P frame is a Predictive-Picture which is predicted from information presented in the nearest preceding I or P frame, and the B frame is a Bidirectionally predictive-picture which is predicted from information presented in both the nearest preceding I or P frame and the nearest following I or P frame. Among those three type of frames, the I frame is the largest, that is, information carried thereby is the largest among all, the P frame is the second-largest, and the B frame is the smallest. Here, although the frames are rather compression algorithm dependent, an information ratio among those frames is about I:P:B=4:2:1. Generally in the MPEG video stream, out of every GOP (group of pictures) of 15 frames, the I frame occurs once, the P frame occurs four times, and the B frame occurs ten times.
The second characteristic of the MPEG video is to dynamically allocate information on a picture basis according to the complexity of a target image. An MPEG decoder is provided with a decoder buffer, and data is once stored therein before decoding. In this manner, any complex image which is difficult to compress can be allocated with a large amount of information. Not restricting only to MPEG, in any other compression scheme for the moving pictures, the capacity of the general-type decoder buffer is often defined by standards. In MPEG-1 and MPEG-2, the capacity of the standardized-type decoder buffer is 224 KByte. An MPEG encoder thus needs to generate picture data so that the occupancy of the decoder buffer remains within the capacity.
FIGS. 19A to 19C are diagrams for illustrating a conventional streaming method. Specifically, FIG. 19A shows video frames, FIG. 19B is a diagram schematically showing the change of buffer occupancy, and FIG. 19C is a diagram exemplarily showing the structure of a conventional terminal. In FIG. 19C, the terminal includes a video buffer, a video decoder, an I/P re-order buffer, and a switch. Herein, the video buffer corresponds to the above-described decoder buffer. Any incoming data is once stored in the video buffer, and is then decoded by the video decoder. The decoded data then goes through the I/P re-order buffer and the switch, and is arranged in a temporal order of playback.
In FIG. 19B, the longitudinal axis denotes the buffer occupancy, that is, the amount of data that is stored in the video buffer, and the lateral axis denotes the time. In FIG. 19B, the thick line denotes the temporal change of the buffer occupancy. Further, the slope of the thick line corresponds to the bit rate of the video, and indicates that the data is inputted to the buffer at a constant rate. FIG. 19B also shows that the buffer occupancy is decreased at constant intervals (e.g., 33.3667 msec). This is because the data in each video frame is continuously decoded in a constant cycle. Also, in FIG. 19B, every intersection point of the diagonal dotted line and the time axis denotes a time when the data in each video frame starts heading for the video buffer. Accordingly, it is known that a frame X in FIG. 19A starts heading for the video buffer at t1, and a frame Y starts heading for the video buffer at t2.
In FIGS. 19A and 19B, the length of time from t1 to a time when decoding is first performed (in FIG. 19B, a point at which the thick line first drops) is generally referred to as a time vbv_delay. Decoding is performed immediately after the video buffer is filled. Therefore, the time vbv_delay usually denotes a length of time for the video buffer of 224 KByte to be full from video being input thereto. That is, denoted thereby is an initial delay time (latency time to access a specific frame) from when video is input to when video is played back by the decoder.
In the case that the frame Y in FIG. 19A is a complex image, the frame Y includes a large amount of information. Thus, as shown in FIG. 19B, data transfer to the video buffer needs to be started earlier (t2 in the drawing) than the decoding time for the frame Y (t3). Note that, no matter how complex the image of the frame Y is, the available buffer occupancy remains within 224 KByte.
If data transfer to the video buffer is performed so as to maintain such a change of buffer occupancy as shown in FIG. 19B, the MPEG standard assures that streaming is not disturbed due to underflow and overflow of the video buffer.
Description of Reception Buffer for Transfer Jitter Absorption on a Network
As shown in FIG. 20, in a system where a server 201 and a terminal 202 are connected to each other through a network 203, a transfer rate fluctuates when MPEG data in a storage 210 is distributed. This fluctuation is due to a time for packet assembly in a generation module 211, a time for a transfer procedure in network devices 204 and 205, and a transfer delay time due to congestion on the network 203, for example. Thus, the change of buffer occupancy shown in FIG. 19B actually cannot be maintained. As a method for reducing and absorbing such fluctuation of the transfer rate (transfer jitter), a content of the encoding rate which is sufficiently smaller than that of the bandwidth of the network is to be transferred. However, from a view point of efficiently utilizing the network resource to provide high-quality video and audio, this method is not considered appropriate. Therefore, a method is generally applied for always transferring data a little ahead of time, and if data transfer is delayed, data shortage is compensated. In this case, the network devices 204 and 205 are provided with transmission and reception buffers 206 and 207, respectively.
Here, providing the reception buffer 207 on the terminal 202 side means approximately the same as increasing the capacity of a decoder buffer 208 from the standardized 224 KByte by the capacity of the reception buffer 207. For comparison, FIGS. 21A and 21B show the change of buffer occupancy before and after the reception buffer 207 is included. Here, FIG. 21A is the same as FIG. 19B.
By adding the reception buffer 207, the buffer capacity is increased, and the change of buffer occupancy looks as shown in FIG. 21B. Accordingly, even if the transfer rate of the network is decreased, the buffer will not underflow. On the other hand, the time vbv_delay is lengthened by a time corresponding to the capacity of the reception buffer 207. As a result, the starting time for decoding in a decoder 209 and the starting time for playback in a playback device 212 are both delayed. That is, the time to access a specific frame is increased by the time that is taken for data storage in the reception buffer 207.
As is known from the above, in a network environment such as small-scale LAN where credibility and transmission speed are assured, when the multimedia data such as MPEG data is subjected to streaming playback, streaming playback may not be distributed due to underflow and overflow of the decoder buffer. This is basically true as long as the system is designed so as to keep the initial delay time (vbv_delay) at playback specified by codec specifications and the change of decoder buffer occupancy.
However, in the wide area network such as the Internet, the transfer jitter resulting from fluctuation of transmission characteristics of the transmission path is too large to ignore. Therefore, together with the decoder buffer (vbv buffer) within the codec specifications, the conventional terminal 202 often includes another buffer as the reception buffer 207 of FIG. 20 for transfer jitter absorption. If this is the case, however, another problem arises.
The capacity of such buffer included in the terminal for jitter absorption generally varies depending on the device type. Therefore, even if data is distributed under the same condition, the device with a large buffer capacity can perform streaming playback with no problem, but the device with a small buffer capacity cannot absorb the jitter enough and thus fails in streaming playback.
To solve this problem, for example, the buffer capacity for jitter absorption may be sufficiently increased by increasing the amount of memory in the terminal. However, the memory is the one mainly determining the price of the terminal, and as to the price, the cheaper is desirably the better. Also, if the buffer capacity for jitter absorption is too large, a time to access a specific frame resultantly takes longer, which inevitably will irritate the user.