The present invention relates generally to decoding/decompression of compressed video in the discrete cosine transform (xe2x80x9cDCTxe2x80x9d) domain, and more particularly to methods of and systems for detection and proper filtering/scaling of interlaced moving areas in MPEG-2 encoded video.
MPEG is an abbreviation for Moving Pictures Experts Group. MPEG was formed in 1988, originally to create a standard for compressing motion video. The need for compression of motion video for digital transmission becomes apparent with even a cursory look at uncompressed bitrates in contrast with bandwidths available. Full-motion video requires a large amount of storage and data transfer bandwidth. The standard U.S. broadcast television signal, sometimes referred to as an xe2x80x9cNTSCxe2x80x9d signal, has a bitrate of 168 Megabits (xe2x80x9cMbitsxe2x80x9d) per second. By comparison, transmission of five-channel stereo uncompressed audio, for example, requires a bitrate of 3.5 Megabits per second. A single-speed CD-ROM delivers data at a bitrate of 1.5 Mbits per second.
The first goal for the MPEG group was to compress video together with audio into a CD-ROM-sized bandwidth of 1.5 Mbits per second. The result of their work is the MPEG-1 encoding standard, completed in 1991. At that time video compression technology embodied in MPEG-1 had still not attained the ability to encode broadcast-quality interlaced video. This became a primary goal for the MPEG-2 standard.
Interlaced video describes a scanning system for video devised in the early days of television to allow video frames to be repainted at a high enough rate to prevent visible flickering of the video screen. This is accomplished by scanning alternate lines of a frame in half a frame interval. For example, in a 525-line system, all 525 lines are drawn in one thirtieth of a second, the frame time. The screen is actually repainted every one sixtieth of a second. First every other scan line is drawn (the xe2x80x9ctopxe2x80x9d field), followed by the missed scan lines. A screen repaint of one sixtieth of a second is fast enough to prevent visible flickering.
Video encoding in MPEG-2 may be either frame-based DCT or field-based DCT. A frame-based DCT encoded video block contains information from both top and bottom fields. A field based DCT encoded block contain information from either top field or bottom field but not both.
Frame-based DCTs are usually used in stationary areas because of the local progressive feature of stationary areas, that is, there is high spatial correlation between the two fields. Field-based DCTs, on the other hand, are often used in areas of motion, where there are significant differences between the two fields. Frame-based DCT""s, if used in this situation, will result in significant energy in the high vertical frequency DCT coefficient, reducing the compression efficiency.
MPEG2 decoder with embedded resizing is a concept generally known to those in motion video and related industries. A decoder with embedded resizing allows one encoded video source to be decoded onto any supported display format, such as standard NTSC (United States), PAL (Europe), or other display device, using one standard decoder. They have been of great interest because of their relatively low cost in such applications as Standard-Definition (xe2x80x9cSDxe2x80x9d) display of High-Definition (xe2x80x9cHDxe2x80x9d) video streams. Decoders with embedded resizing take advantage of the smaller output format by embedding scalars in the decoding loop. To avoid aliasing, filtering is needed either before or combined with scaling. The filtering/scaling can be done in either the spatial domain or in the DCT domain. The embedded scaling, preferably done in the DCT domain due to its simplicity, reduces the amount of data to be processed in the Inverse Discrete Cosine Transform (xe2x80x9cIDCTxe2x80x9d) and Motion Compensation (xe2x80x9cMCxe2x80x9d) decoding steps.
Additionally, embedded filtering and scaling of DCT encoded video is more useful than non-embedded implementations for other reasons. First, filtering and scaling after full decompression is wasteful of system resources, as larger areas of memory and longer sets of calculations must be performed. Second, advantage can be made of special properties of interlaced video for proper scaling and filtering by allowing DCT encoded blocks to be filtered and scaled dynamically according to their local feature.
There are also two options for filtering/scaling as well: frame-based and field-based. Frame-based filtering/scaling tends to keep spatial resolution but lose temporal resolution. Field-based filtering/scaling, contrarily, tends to keep temporal resolution but lose spatial resolution. Therefore, to get the best results, frame-based methods should be used in stationary areas and field-based methods should be used in moving areas.
Note that frame- or field-based DCTs are chosen by the encoder, whereas decisions of frame- or field-based filtering/scaling are made by the decoder. The decoder, in the present state of the art, resorts to one of the following two approaches for deciding whether to use frame- or field-based filtering/scaling:
1. Assume the encoder makes the appropriate choices, i.e., frame-DCTs for stationary areas and field-DCTs for moving areas. The decoder simply selects frame- or field-based filtering/scaling based on the DCT type selected by the encoder;
2. Do not trust the encoder at all and always apply the same filtering/scaling mode regardless of the DCT type. Using this approach, field-based filtering/scaling is usually applied to both frame-DCTs and field-DCTs.
The first approach delivers better spatial resolution when the encoder does make the right choice, so that the picture is generally sharper. This approach is, however, vulnerable to bad encoder decisions, such as using frame-DCTs in a moving area, which may lead to some visibly annoying blocks.
The second approach does not risk mixing the two fields, but its picture quality is not as good due to its loss of spatial resolution.
The present invention provides methods of and systems for addressing the needs of the prior art. These methods and systems provide the ability to determine whether the local area subject to filtering/scaling is a stationary area or an interlaced moving area, and, given such information, dynamically switch between the frame- or field-based operations in a smart way, thereby optimizing the output picture quality.
There is also a need for applying field-based operations on frame DCT encoded blocks to overcome the problems of the prior art. Thus, another object of the invention is the proper filtering/scaling of DCT encoded compressed interlaced video. A DCT-domain-filtering scheme for field-based filtering/scaling of frame-DCT data is provided herein.
When the DCT encoded video block of the compressed video stream is a field-based DCT encoded block, the method includes determining that field-based decoding and filtering/scaling methods are to be used to process the DCT encoded block.
Alternatively, when the DCT encoded block of the compressed video stream is a frame-based DCT encoded block, the method includes obtaining a first absolute value which represents the energy of vertical high frequency of the DCT encoded block of the compressed video stream. Thereafter, this first absolute value is compared to a predetermined first reference value. When the first absolute value is less than or equal to the predetermined first reference value, the method includes determining that frame-based decoding and filtering/scaling methods are to be used to process the DCT encoded block.
Alternatively, when the first absolute value is greater than the predetermined first reference value, the method includes obtaining a second absolute value which represents the energy of vertical mid frequency of the DCT encoded block of the compressed video stream. Advantageously, a second comparison can be conducted in which the second absolute value which represents the energy of vertical mid frequency is compared to a second predetermined reference value. When the second absolute value which represents the energy of vertical mid frequency is less than the second predetermined reference value, the method includes determining that frame-based decoding and field-based and filtering/scaling methods are to be used to process the DCT encoded block.
Alternatively, when the second absolute value which represents the energy of vertical mid frequency is greater than or equal to the second predetermined reference value, the method includes determining that frame-based decoding and filtering/scaling methods are to be used to process DCT encoded block.
Preferably, the video stream includes a plurality of DCT encoded blocks, in which the absolute value of a left-bottom area of a DCT encoded block is used as the first absolute value representing the energy of vertical high frequency of the DCT encoded video block of the compressed video stream, and the absolute value of a left-middle area of a DCT encoded block is used as the second absolute value representing the energy of vertical mid frequency of the DCT encoded block of the compressed video stream.
In another embodiment of the invention, the processing step includes embedded resizing that dynamically chooses frame- or field-based scaling performed within a decoding loop.
In another embodiment of the invention, the processing step includes filtering and scaling of the frame DCT blocks on a field basis.
Since it is possible for interlaced video to be encoded using MPEG-2 frame-based DCT encoded video blocks, it is another object of the present invention to detect interlaced moving areas in frame-based DCT encoded blocks. In this embodiment, the invention relates to a method of detecting whether an area of a compressed video stream is an interlaced moving area, where the area of the compressed video stream is represented by a plurality of frame DCT encoded blocks. The method includes obtaining a DCT encoded video block of the compressed video stream and thereafter obtaining a first absolute value which represents the energy of vertical high frequency of the DCT encoded block of the compressed video stream. Next, the absolute value is compared to a predetermined first reference value.
When the first absolute value is less than or equal to the predetermined first reference value, the method includes determining that the area of the compressed video stream represented by the DCT encoded block is not an interlaced moving area.
Alternatively, when the first absolute value is greater than the predetermined first reference value, the method includes obtaining a second absolute value which represents the energy of vertical mid frequency of the DCT encoded block of the compressed video stream. Thereafter, a second comparison is conducted in which the second absolute value which represents the energy of vertical mid frequency is compared to a second predetermined reference value.
When the second absolute value which represents the energy of vertical mid frequency is less than the second predetermined reference value, the method includes determining that the area of the compressed video stream represented by the DCT encoded block is an interlaced moving area.
Alternatively, when the second absolute value which represents the energy of vertical mid frequency is greater than or equal to the second predetermined reference value, the method includes determining that the area of the compressed video stream represented by the DCT encoded block is not an interlaced moving area.
Again, preferably, the video stream includes a plurality of DCT encoded blocks, in which the absolute value of a left-bottom area of a DCT encoded block is used as the first absolute value representing the energy of vertical high frequency of the DCT encoded video block of the compressed video stream, and the absolute value of a left-middle area of a DCT encoded block is used as the second absolute value representing the energy of vertical mid frequency of the DCT encoded block of the compressed video stream.
The invention also relates to a system for processing a compressed video stream represented by a plurality of DCT encoded blocks. This system includes a video signal source of the compressed video stream, a processor operatively coupled to the video signal source, and a video output.
The processor is configured to conduct the method described herein.
In another embodiment of the invention, the system includes computer-readable memory as the video signal source.
In another embodiment of the invention, the system includes computer-readable memory as the video output.
Other improvements which the present invention provides over the prior art will be identified as a result of the following description which sets forth the preferred embodiments of the present invention. The description is not in any way intended to limit the scope of the present invention, but rather only to provide a working example of the present preferred embodiments. The scope of the present invention will be pointed out in the appended claims.