The present invention relates to systems for MPEG-2 video bitstreams. More particularly, the present invention relates to a system and method for an ultra-low delay MPEG-syntax video codec.
The Moving Picture Experts Group (MPEG)-2 standard is commonly used for transmission of high quality video in many applications, especially for digital video broadcasting. This standard uses effective compression methods for reducing the temporal and spatial redundancy existing in video sequences. This is done with tools such as: motion estimation/compensation techniques, which use I, B, P frame types; Discrete Cosine Transform (DCT); quantization (Q); Variable Length Coding (VLC), Motion Compensation Prediction (MCP), etc. DCT and Q also have inverse operations, IDCT and IQ, respectively, as shown in FIGS. 2a and 2b below.
The human eye has a limited response to fine spatial detail, and is less sensitive to detail near object edges or around shot-changes. Consequently, controlled impairments introduced into the decoded picture by the bit rate reduction process should not be visible to a human observer.
Intra-frame compression is compression that reduces the amount of video information in each frame on a frame-by-frame basis. Inter-frame compression is a compression scheme, such as MPEG-2, that reduces the amount of video information by storing only the differences between a frame and those preceding it. An important application of digital signal processing (DSP) is in signal compression and decompression. The architecture of a DSP chip is designed to carry out such operations incredibly fast, processing up to tens of millions of samples per second, to provide real-time performance. That is, the ability is required to process a signal xe2x80x9clivexe2x80x9d as it is sampled, and then output the processed signal, for example, to a video display.
An I-frame (Intra-frame), in Inter-frame compression schemes, is the key frame, or reference video frame, that acts as a point of comparison to P-frames and B-frames, and is not reconstructed from another frame. A P-frame is the Predictive video frame that exhibits the change that occurred compared to the I-frame or P-frame before it. A B-frame is a highly compressed, Bi-directional frame that records the change that occurred between the I-frame or P-frame before and after it.
FIG. 1 is a schematic illustration of spatial and temporal redundancy 100. Spatial redundancy is seen in current frame 120. That is, pixels 125 are shown as identical. Thus, it suffices to transmit the details of the first of pixels 125, and the fact that the following three pixels are the same, without repeating their details.
Temporal redundancy is shown by the relationship between pixel 112 of previous frame 110 and pixel 122 of current frame 120. Vector 114 represents the inverse of this xe2x80x9cmovementxe2x80x9d.
FIGS. 2a and 2b are prior art detailed schematic block diagrams of a typical encoder 210 and decoder 220, respectively.
Quantization (Q) is the function of coder 210 to transmit the DCT block to decoder 220, in a bit rate efficient manner, so that it can perform the inverse transform to reconstruct the image. It has been observed that the numerical precision of the DCT coefficients may be reduced while still maintaining good image quality at decoder 220. Quantization is used to reduce the number of possible values to be transmitted, reducing the required number of bits.
The degree of quantization applied to each coefficient is weighted according to the visibility of the resulting quantization noise to a human observer. In practice, this results in the high-frequency coefficients being more coarsely quantized than the low-frequency coefficients. Note that the quantization noise introduced by the coder is not reversible in the decoder, making the coding and decoding process xe2x80x98lossyxe2x80x99.
Coding is the serialization and coding of the quantized DCT coefficients used to exploit the likely clustering of energy into the low-frequency coefficients and the frequent occurrence of zero-value coefficients. The list of values produced by scanning is entropy coded using a variable-length code (VLC). The VLC allocates code words, which have different lengths depending upon the probability with which they are expected to occur.
These tools have been investigated and optimized in many systems, for example using digital signal processor (DSP) and Field Programmable Gate Array (FPGA) architectures, and have reached a very mature stage. Therefore, they can be implemented in real-time, give good perceptual quality, and can have a lot of installed-base equipment that can use them. However, these systems usually cause an inherent delay of several 100""s msec. In some applications a much lower delay is required, e.g. a few msec.
MPEG operation is characterized by frame-by-frame processes. The encoder receives an entire frame, and then decides on how to process the frame. All the coded bits of the frame are put in a buffer, and are then transmitted to the decoder. The decoder performs a similar sequence. Each of these frame-by-frame steps is a source of delay.
Each frame is composed of two interlaced video alternating scan line fields. In field-frame operation, the first field must be transmitted completely before the second field is transmitted. This is an additional source of delay.
FIG. 2c is a prior art schematic block diagram illustrating the pixels, macroblocks and slices within a frame 230. Each slice can be decoded independently from any other slice, although macroblocks are interdependent. There are a few different possibilities for how macroblocks define a slice.
Another source of delay arises from predictive sequences. For example, in presentation display order, an I-frame may be followed by two B-frames and then a P-frame. Since the B-frames are constructed from information contained in the I-frame and P-frame, the 6-frames cannot be transmitted until the I-frame and then the P-frame are received. Thus there is a difference between the display order and the transmission order, thus introducing further delay. FIG. 2d is a prior art schematic block diagram of the prediction process for I, B and P frames 240.
A fourth source of delay is attributed to differing bit rates for I, B and P frames. The I-frames require considerable bits, B-frames have the fewest and P-frames are intermediary. The buffer is hardware, and therefore is of fixed size. Whichever frame is largest determines the minimum buffer size. Also more complex video scenes require more bits. However, the larger buffers have greater delay.
A fifth cause of delay emanates from the processing time for the encoder and decoder.
The available solutions refer to removing several of the aspects that contribute most to the coder/decoder (codes) delay. The most problematic issue is the IBP frame type. The B frames include an inherent delay of several frames. Each frame is 40/33 msec depending on the video format Phase Alteration Line/National Television System Committee (PAL/NTSC), respectively. In addition, since each of the three frame types, I; B; and P require a substantially different number of bits, a large Video Buffer Verification (VBV) buffer must be used to achieve reasonable quality of the video.
Therefore, existing methods use P-only frame types, thereby eliminating large I-frames and small B-frames. The buffer can therefore be quite efficient. When there are frequent scene changes, this somewhat reduces the visual quality and the channel noise error-resilience. However, it enables lower delay and usually an acceptable amount of reduction in quality and resilience. This method reduces the required delay to a single frame at the encoder plus a single frame at the decoder, plus roughly a single frame to handle fluctuations in coding difficulty and real-time implementation. This amounts to 120/100 msec for PAL/NTSC respectively. The theoretic limit, assuming zero VBV buffer size and infinitely strong processing time, can go down to 80/66 msec. In practical applications it cannot go under 100 msec. This is the minimal limit since MPEG codec requires receiving the entire frame and than encoding or decoding it with a processor or other programmable logic. This delay is good enough for some applications, such as remote interviews, but is still too large for applications that require remote control of processes. The MPEG2 standard also refers to low-delay mode. However, in this mode, frames that are large are skipped and the last xe2x80x9csmallxe2x80x9d frame is repeated until the next xe2x80x9csmallxe2x80x9d frame appears.
Therefore, there is a need for a method that overcomes the limitations of prior art video encoders and decoders, and provides for true low-level delay in the encoding and decoding of digital video bitstreams.
Accordingly, it is a principal object of the present invention to overcome the limitations of prior art devices and provide a method that solves the need for reducing end-to-end delay in the encoding and decoding of digital video bitstreams
It is another principal object of the present invention to process frames on a field-by-field basis.
It is yet another object of the present invention to process fields on a slice-by-slice basis.
A system is disclosed for compressed digital video bitstreams, in which a plurality of P-field slices are processed, wherein the system includes an encoder to encode the bits for each successive P-field slice. The system also includes a decoder buffer, where the bits enter at a fixed rate, and a decoder, which uses the extracted bits to decode each field of each frame and display each frame. The delay is reduced to under 10 msec and the buffer stays within the frame boundaries.
The present invention describes a method that can be displayed by any standard MPEG2 decoder and can reach the ultra-low-delay requirements in specifically designed MPEG2-like decoders.
Additional features and advantages of the invention will become apparent from the following drawings and description.