1. Field of the Invention
The present invention generally relates to apparatus for encoding visual images, including spatial (Intra-pictures) and temporal (inter-picture) compression, that is redundancy within a picture and redundancy between pictures. Specifically, this invention relates to an encoder which combines the functionality of a three chip encoder chipset into a single chip while maintaining original picture quality associated with the three chip encoder.
2. Discussion of the Prior Art
Within the past decade, the advent of world-wide electronic communications systems has enhanced the way in which people can send and receive information. In particular, the capabilities of real-time video and audio systems have greatly improved in recent years. In order to provide services such as video-on-demand and video conferencing to subscribers, an enormous amount of network bandwidth is required. In fact, network bandwidth is often the main inhibitor in the effectiveness of such systems.
In order to overcome the constraints imposed by networks, compression systems have emerged. These systems reduce the amount of video and audio data which must be transmitted by removing redundancy in the picture sequence. At the receiving end, the picture sequence is uncompressed and may be displayed in real-time.
One example of an emerging video compression standard is the Moving Picture Experts Group ("MPEG") standard. The MPEG committee was formed in 1988 to establish standards for coding moving pictures and associated audio information on digital media. The first phase of their work was completed in 1991, namely ISO standard 11172, which defines MPEG-1. MPEG-1 was defined to cover a continuous bitrate of about 1.5 Mbit/sec. The second phase has defined ISO standard 13818-2, called MPEG-2. The syntax for MPEG-2 is very robust and permits various types of compression options, such as the handling of interlaced video sources and frame or field based coding. The MPEG data stream consists of a number of structured layers that are defined in the MPEG-2 standard. The video layer contains the coded information required to represent the video portion of the data stream. The IBM Encoder chip set produces only the data for the video layer, in compliance with the MPEG-2 video standard (ISO 13818-2).
Within the MPEG standard, video compression is defined both within a given picture and between pictures. Video compression within a picture is accomplished by conversion of the digital image from the time domain to the frequency domain by a discrete cosine transform, quantization, and variable length coding. Video compression between pictures is accomplished via a process referred to as motion estimation and compensation. Motion estimation covers a set of techniques used to extract the motion information from a video sequence and is well known in the art. The process of motion estimation effectively reduces the temporal redundancy in successive video frames by exploiting the temporal correlation (similarities) that often exists between successive frames. Higher compression ratios are achievable by virtue of exploiting the temporal redundancy between pictures. Motion compensated compression is achieved through the use of P and B pictures.
In general, each frame in a video sequence is one of three picture types I, P, or B, where "Intra frames" or "I" pictures are encoded and transmitted whole, and do not require motion vectors to be defined. These "I" pictures serve as a source of motion vectors. There are also "Predicted pictures" or "P" pictures which are formed by motion vectors from a previous picture and can serve as a source of motion vectors for further pictures. There are still further "Bidirectional frames" or "B" pictures which are formed by motion vectors from two other pictures, one past and one future, and cannot serve as a source of motion vectors. Motion vectors are generated from "I" and "P" pictures, and are used to form "P" and "B" pictures.
In the motion estimation process for P-pictures, the luminance portion of each 16.times.16 pixel current macroblock (CMB) contained in the picture currently being encoded is compared against the luminance portions of a set of 16.times.16 pixel reference macroblocks (RMBs) contained within a search window in a previously encoded reference picture (e.g. past reference frame) to determine a best match macroblock. Once a best match macroblock is selected a difference result is computed which represents the difference between the current macroblock (CMB) value and the chosen "best match" macroblock (RMB).
For B-pictures, the luminance portion of each CMB is compared against the luminance portions of a set of RMBs contained in each of the previously encoded past and future pictures. In addition, the luminance portion of each CMB is compared against the luminance portion of the RMB formed by averaging the best past and future picture search results. These three B-picture search results (past, future and bidirectional) are computed to determine the overall best match. As was true in the P-picture case, once a best match macroblock is selected a difference result is computed which represents the difference between the current macroblock (CMB) value and the chosen "best match" macroblock (RMB).
Two types of information result from the motion estimation and compensation process: motion vector(s) and a motion compensated prediction error (MCPE) macroblock. The motion vectors pinpoint the location of the luminance portion of the best match RMB. The motion vectors predict where a macroblock of pixels will be in a prior and/or subsequent picture. The MCPE macroblock is comprised of pixel difference data which results from subtracting both the luminance and chrominance components of the best match RMB from the CMB. This information is encoded and inserted into the compressed bitstream in cases where the encoder determines that coding of the inter (motion compensated) macroblock is more efficient than coding the intra (non-motion compensated) macroblock.
The IBM encoder chip set utilizes standard motion compensation and estimation techniques as part of an overall process of compressing real-time digital video input into MPEG-2 compliant bitstreams. The architecture is scalable in that either one, two or all three chips can be used for different types of MPEG-2 compression applications. The encoder consists of three chips, an Intra chip (I), a refine chip (R), and a search chip (S). The chips can be operated in a one, two or three chip configuration. The I chip is the base encoding chip and is required for all three configurations. All communication between the encoder and the external system is through the I chip. In a three chip configuration, the I, R, and S chips will produce IPB encoded pictures. This results in a lower number of bits to transport/store compared to I or IP encoded pictures. IPB encoding fits into many applications and especially those that require the lowest bit rate/smallest bandwidth such as DVD and DBS. Other applications include video distribution, and PC applications such as desktop publishing.
Given the diversity of applications required by the three chip configuration, it becomes increasingly desirable, from the point of view of the application card designer to combine the three chip chipset into a single chip to conserve circuit board space while maintaining the functionality of the original chipset.
A need therefore exists for an encoder which combines the functionality of encoder chip sets of the prior art into a single chip while maintaining the picture quality achieved by the prior art chip set.