Compression encoding is often applied to accumulate and transmit video data as a time series collection of image frames in order to restrict an accumulation capacity or transmission path band. Variable length encoding is typically used for the compression encoding. Thus, the amount of data per frame in a compressed data string (which will be called bit stream hereinafter) is not constant. Since the amount of exchanged data per unit time varies independent of a property of a transmission path in a system for transmitting and processing such bit streams in real-time, there is a problem that stable transmission cannot be performed. In order to solve such a problem, there is generally employed a method for smoothing the amount of data per unit time flowing in the transmission path and adjusting it to the property of the transmission path by providing a buffer behind an encoder and in front of a decoder.
FIG. 7 is a block diagram illustrating an exemplary system including a video encoding/sending apparatus and a video decoding apparatus. In the system illustrated in FIG. 7, a video encoding/sending apparatus 710 compresses and encodes an input video in real-time. The video encoding/sending apparatus 710 then transmits data (bit streams) to a video decoding apparatus 720 via a buffer. The video decoding apparatus 720 decodes the data received from the video encoding/sending apparatus 710 in real-time to be output as a video.
The video encoding/sending apparatus 710 includes an encoder 711 and a transmission buffer 712. The encoder 711 sequentially compresses and encodes image frames configuring the input video thereby to generate bit streams, and supplies the generated bit streams to the transmission buffer 712. The transmission buffer 712 outputs the supplied bit streams at a predetermined transfer speed (bit rate) while accumulating the supplied bit streams. The bit streams output from the video encoding/sending apparatus 710 are supplied to the video decoding apparatus 720 via a network, for example.
The video decoding apparatus 720 includes a reception buffer 721 and a decoder 722. The reception buffer 721 cuts out data per frame while accumulating the bit streams input at a predetermined bit rate. The reception buffer 721 then supplies the cut data to the decoder 722 at a predetermined timing. The decoder 722 sequentially decodes the supplied bit streams thereby to generate image frames, and outputs the generated image frames as a video.
Videos may be selectively transmitted and processed from many previously-accumulated video contents. In such a case, in order to reduce a storage capacity of a storage for accumulating the video contents or computation cost of the compression encoding processing during transmission, there is employed a system for previously compressing and encoding, and accumulating input videos offline and for transmitting the accumulated videos in response to a request from the reception side. Also in this case, a buffer needs to be provided behind the encoder and in front of the decoder in order to perform stable transmission.
FIG. 8 is a block diagram illustrating an exemplary system including a video encoding apparatus, a video sending apparatus, and a video decoding apparatus. In the system illustrated in FIG. 8, a video encoding apparatus 810 compresses and encodes an input video, and accumulates it in a storage (storage device 820), and then a video sending apparatus 830 transmits data (bit streams) to a video decoding apparatus 840 via a buffer. The video decoding apparatus 840 decodes and outputs the received data as a video.
The video encoding apparatus 810 sequentially compresses and encodes image frames configuring the input video thereby to generate bit streams, and accumulates the generated bit streams in the storage device 820.
The video sending apparatus 830 includes a sender 831 and a transmission buffer 832. The sender 831 extracts the bit streams accumulated in the storage device 820, and supplies the extracted bit streams to the transmission buffer 832. The transmission buffer 832 outputs the supplied bit streams at a predetermined bit rate while accumulating the supplied bit streams. The bit streams output from the video sending apparatus 830 are supplied to the video decoding apparatus 840 via a network, for example.
The video decoding apparatus 840 includes a reception buffer 841 and a decoder 842. The reception buffer 841 cuts out data per frame while accumulating the bit streams input at a predetermined bit rate. The reception buffer 841 then supplies the cut data to the decoder 842 at a predetermined timing. The decoder 842 sequentially decodes the supplied bit streams thereby to generate image frames, and outputs the generated image frames as a video.
In this way, in the video transmission system including the transmission buffer and the reception buffer, the video encoding apparatus needs to perform encoding in order to prevent overflow or underflow in the buffer in the video decoding apparatus. Thus, the video encoding apparatus uses a virtual buffer simulating the operations of the reception buffer in the video decoding apparatus to perform the compression encoding processing while monitoring the virtual buffer. Specifically, when performing the compression encoding processing, the video encoding apparatus adjusts a compression rate of each image frame in order to prevent overflow and underflow in the virtual buffer.
A model of the virtual buffer is defined depending on a video compression encoding system. For example, if the ISO/IEC 13818-2 MPEG-2 system and the ISO/IEC 14496-2 MPEG-4 Part 2 system are employed for the video compression encoding system, the VBV (Video Buffering Verifier) buffer model is defined as a virtual buffer model. Further, for example, if the ISO/IEC 14496-10 MPEG-4 AVC/H.264 system is employed for the video compression encoding system, the CPB (Coded Picture Buffer) model in a virtual decoder HRD (Hypothetical Reference Decoder) is defined as a virtual buffer model. The virtual buffer models are described in detail in NPL 1, for example.
The representative operations of the virtual buffer will be described below with reference to FIG. 9 and FIG. 10. The operations of the video encoding apparatus for controlling the virtual buffer based on previously-defined operation parameters of the virtual buffer will be specifically described.
FIG. 9 is an explanatory diagram illustrating the amounts of generated codes per image frame in a bit stream obtained from a compressed and encoded video by way of example. FIG. 10 is an explanatory diagram illustrating an exemplary transition of the buffer occupancy amount in the virtual buffer simulating the reception buffer for inputting the bit streams.
The virtual buffer has a preset buffer size Bmax. The virtual buffer keeps on increasing the buffer occupancy amount in the virtual buffer at a predetermined bit rate R until a predetermined time Dinit is reached. When the time Dinit is reached, the virtual buffer subtracts the amount of codes A(0) in a group of data corresponding to an image frame with the frame number 0 from the buffer occupancy amount. The processing corresponds to a processing of supplying the group of data of the frame from the reception buffer to the decoder in the actual video decoding apparatus. Assuming the time as t(0) and a reproduction time interval between image frames in the video as f, the virtual buffer keeps on increasing the buffer occupancy amount in the virtual buffer at the predetermined bit rate R again until time t(1) defined in the following Equation (1) is reached. The video encoding apparatus then subtracts the amount of codes A(1) in the group of data corresponding to an image frame with the frame number 1 from the buffer occupancy amount at time t(1).t(k+1)=t(k)+f  Equation (1)
Subsequently, the virtual buffer repeatedly performs the processing of subtracting the amount of codes A(k) in the group of data corresponding to an image frame with the frame number k from the buffer occupancy amount at time t(k) while increasing the buffer occupancy amount in the virtual buffer at the predetermined bit rate R.
The bit rate R, the buffer size Bmax, and the decoding start delay time Dinit in the group of parameters are generally determined before the start of the encoding processing depending on an image quality request, a delay request, a property of the transmission path, or the like, and are transmitted to the video decoding apparatus together with bit streams. An initial buffer occupancy amount Binit obtained in the following Equation (2) may be transmitted instead of the decoding start delay time Dinit. There is, as the transmission method, a method for encoding these information as auxiliary information, and multiplexing and transmitting the auxiliary information on bit streams, which conforms to the rules described in NPL 2, for example.Binit=Dinit×R  Equation (2)
FIG. 11 is a block diagram illustrating an exemplary structure of a typical video encoding apparatus for encoding while monitoring the operations of the virtual buffer.
The video encoding apparatus illustrated in FIG. 11 includes a video encoder 911, a virtual buffer 912, and a multiplexer 913.
The video encoder 911 encodes each image frame configuring input data (input video) thereby to generate video bit streams while monitoring the operations of the virtual buffer 912. The video encoder 911 then supplies the generated video bit streams to the multiplexer 913, and supplies the amount of generated codes to the virtual buffer 912.
The virtual buffer 912 calculates the buffer occupancy amount at each instant of time based on buffer setting information including the buffer size or the delay amount, and the amount of generated codes supplied from the video encoder 911. The virtual buffer 912 feeds back the calculation result to the video encoder 911.
The multiplexer 913 encodes the buffer setting information as auxiliary information, multiplexes the encoded buffer setting information on the video bit streams supplied from the video encoder 911, and outputs the multiplexed video bit streams as bit streams.
As described above, in the system for performing transmission via the transmission buffer and the reception buffer illustrated in FIG. 7 and FIG. 8, even if the amount of data per unit time in a bit stream varies, the variation can be absorbed by the buffer. Therefore, the video decoding apparatus can decode the bit streams transmitted at the predetermined bit rate without corruption, and can output the decoded bit streams as a video. When a video is transmitted in the system, however, a delay corresponding to Dinit indicated in FIG. 10 occurs. A delay in the video transmission causes a deterioration in interactive property, such as non-smooth switching of reception channels in TV broadcasting. Therefore, the transmission delay is required to be as small as possible in the system for performing transmission via the transmission buffer and the reception buffer.