1. Field of the Invention
The present invention relates to an apparatus and a method for compression-transmitting picture information such as live picture information, a decoding apparatus for reducing compressed video data decoding processing load and decoding picture information, and a storage medium stored a control program for controlling the real time compression-transmission of picture information and decoding the picture information.
2. Description of the Related Art
To compress picture information such as live picture information in a real time manner compression and to transmit the picture information on a computer network, there are roughly two conventional methods, i.e., a direct transmission method and an FIFO (First-In-First-Out) method.
In case of the direct transmission method, bit strings having different sizes according to frames outputted by a live encoder are outputted to a network as they are. In that case, even if the transmission bit rate of data transmitted for one second is constant, a momentary transmission bit rate greatly varies depending on properties such as encode modes and the motions of pictures. Due to this, burst data temporarily occupies a network band, with the result that packet collision and packet loss tend to occur and transmission efficiency thereby deteriorates. If there is little room for an empty band of the network relative to an average stream transmission band, in particular, the deterioration of transmission efficiency is more conspicuous.
The direct transmission method will be described in more detail with reference to FIG. 25. A picture signal inputted from a camera 10 is encoded in frame units (e.g., at intervals of about 1/30 seconds) by a real time encoder 11 which encodes data in real-time, and written into a frame buffer 12. Next, frame data having difference sizes according to frames is outputted altogether to a network 16 at the best efforts by a network transmission section 15a, and fed to clients 171 to 17m connected to the network. In that case, the momentary transmission bit rate of the K-th frame (R no-control-K) is a gradient of a waveform W10 shown in FIG. 26 and obtained from the following formula (1):R no-control-K=BK/TSK[bit/sec]>RS  (1)
Here, TSK: transmission time for transmitting K-th frame data to network,
BK: quantity of the K-th frame data accumulated in frame buffer, and
R no-control-K: network transmission bit rate for K-the frame.
Since this momentary transmission bit rate (R no-control-K) is far higher than an average bit stream rate (RS), chance of transmitting burst packet data is increased, resulting in the deterioration of transmission efficiency. In FIG. 26, TFK is a frame distance between the K-th frame and the (K+1)th frame and TWK is a time for which the encoder 11 writes the K-th frame data into the frame buffer 12.
In case of the FIFO method, bit strings having different sizes according to frames outputted from the live encoder are sequentially written into an FIFO and stream data read from the FIFO at a constant speed using a process different from the encoder process is outputted to the network. The FIFO method will be described in more detail with reference to FIG. 27. A picture signal inputted from the camera 10 is encoded in frame units (e.g., at intervals of about 1/30 seconds) by a real time encoder 11 and sequentially written into an FIFO 12a. Next, the data read from the FIFO 12a at a constant speed by a network transmission section 15a is outputted to a network 16 at a constant rate and fed to clients 171 to 17m connected to the network. A momentary transmission bit rate (RFIFO) in that case is a gradient of a waveform W11 shown in FIG. 28 and equal to an average stream bit rate (RS) as shown in a formula (2) below:RFIFO=RS[bit/sec]  (2)Here, RS: average stream bit rate; and
RFIFO: transmission bit rate for transmission to network.
Due to this, it is possible to avoid the burst transmission of packets and to transmit data efficiently.
However, if the above-described picture information transmission means is used, an FIFO requiring overflow and underflow control is necessary and it is also necessary to start another process for network output, thereby disadvantageously complicating packaging compared with the direct transmission method.
Next, a conventional decoding apparatus for decoding a compressed video data is shown in FIG. 29. In FIG. 29, compressed video data is inputted into a variable length decoder 51 and subjected to variable length decoding. Decoded quantization coefficients a, i.e., quantization discrete cosine transform coefficients are inputted into an inverse quantizer 52, and decoded motion vector information b is inputted into a motion compensation predictor 57. The quantization coefficients a are dequantized by the inverse quantizer 52 and discrete cosine transform coefficients F(u, v) are inputted into an inverse discrete cosine transformer 60. The motion compensation predictor 57 extracts predictive picture data for using prediction from the pictures stored in a frame memory 58 using the motion vector information b.
Encode mode information c decoded by the variable length decoder 51 controls switching means 59. If the encode mode is an intra-plane encode mode, the switching means 59 is turned off and nothing is added to outputs f(x, y) from the inverse discrete cosine transformer 60 by an adder 56. Therefore, the outputs are outputted as decoded picture outputs r(x, y) as they are and also stored in the frame memory 58.
On the other hand, if the encode mode is a mode other than the intra-plane encode mode, the switching means 59 is turned on, the outputs f(x, y) from the inverse discrete cosine transformer 60 are added to motion compensation predictive pictures c(x, y) by the adder 56 and the outputs are outputted as decoded picture outputs r(x, y) and also stored in the frame memory 58.
In a compressed video data decoding processing, inverse discrete cosine transform has the largest processing load. Due to this, such a high-speed inverse discrete cosine transform algorithm as described in B. G. Lee, “A new algorithm to compute the discrete cosine transform”, IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-32, pp. 1243-1245, December 1984 is employed.
If a higher processing is required, a method of reducing a decoding processing by thinning out the number of decoded picture planes is employed. For example, a method in which only pictures which have been subjected to intra-plane encoding (intra encoding) are decoded and pictures encoded in modes other than the intra-plane encode mode are not decoded, is employed.
However, if a decoding processing is conducted by a software using, for example, a personal computer and the processing performance of the personal computer is low, even such a high processing is insufficient, thereby disadvantageously causing a disadvantage that the number of played back pictures planes greatly decreases.