Field of the Invention
The present invention generally relates to video encoders. More specifically, the present invention provides the grouping of coded pixel blocks having similar prediction dependencies to more quickly decode a compressed video data sequence.
Background Art
FIG. 1 is a functional block diagram of an encoder-decoder system 100. The encoder-decoder system 100 includes an encoder 102 and a decoder 104. The encoder 102 receives source video data from a video source 106. The encoder 102 codes the source video data into a compressed bit stream for transmission to the decoder 104 over a communication channel 105. The communication channel 108 can be a real-time delivery system such as a communication network (e.g., a wireless communication network) or a computer network (e.g., the Internet). Alternatively, the communication channel 106 can be a storage medium (e.g., an electrical, optical or magnetic storage device) that can be physically distributed. Overall, the topology, architecture and protocol governing operation of the communication channel 108 is immaterial to the present discussion unless specifically identified herein.
The decoder 104 receives and decodes the compressed bit stream to reproduce the source video data. The decoder 104 can then provide the reproduced source video data to a video display device 110. FIG. 1 shows a single decoder 104 but is not limited as such. That is replicas or copies of the compressed bit stream can be provided to multiple decoders located at different locations. In this way, the source video data can be encoded once and distributed to the decoders for decoding at different times as is conventional or well known in the art.
The encoder 102 and the decoder 104 can be implemented in hardware, software or some combination thereof. For example, the encoder 102 and/or the decoder 104 can be implemented using a computer system. FIG. 2A is a simplified functional block diagram of a computer system 200. The computer system 200 can be used to implement the encoder 102 or the decoder 104 depicted in FIG. 1.
As shown in FIG. 2A, the computer system 200 includes a processor 202, a memory system 204 and one or more input/output (I/O) devices 206 in communication by a communication ‘fabric.’ The communication fabric can be implemented in a variety of ways and may include one or more computer buses 208, 210 and/or bridge devices 212 as shown in FIG. 2A. The I/O devices 206 can include network adapters and/or mass storage devices from which the computer system 200 can receive compressed video data for decoding by the processor 202 when the computer system 200 operates as a decoder. Alternatively, the computer system 200 can receive source video data for encoding by the processor 203 when the computer system 200 operates as an encoder.
The computer system 200 can implement a variety of video coding protocols such as, for example, any one of the Moving Picture Experts Group (MPEG) standards (e.g., MPEG-1, MPEG-2, or MPEG-4) and/or the International Telecommunication Union (ITU) H.264 standard. Most coding standards are designed to operate across a variety of computing platforms. Accordingly, many coding standards find application in feature rich computing devices (e.g. personal computers or gaming devices) and also in feature poor computing devices (e.g., single digital signal processing (DSP) devices).
To accommodate the broad variety of computing devices, most coding standards are designed with unsophisticated computer systems in mind. Specifically, many coding standards are designed to be implemented the same way on a feature rich system as they are on a feature poor system. Feature poor systems typically have limited memory and processor capabilities. Accordingly, due to the design of many coding standards, the improved memory and processor capabilities of a feature rich system are not fully exploited. For example, during the coding of an Intra coded frame (I-frame), macroblock data may be coded with reference to other macroblocks in the same frame. A prediction reference (e.g., a prediction vector) can be generated to specify a location from within previously coded macroblocks from which a prediction will be made. This technique can create a long chain of predictions, requiring macroblocks to be retrieved from memory and then decoded in a serial fashion (e.g. one macroblock at a time).
FIG. 2B illustrates the contents of the memory system 204 depicted in FIG. 2A. As shown in FIG. 2B, the memory system 204 can include coded video data 220, decoded reference frames 222 and a currently decoded frame 224 that is partially decoded. The decoded reference frames 222 can include previous reference frames 226 and future reference frames 228. Previous reference frames 226 are frames that occur earlier in time than the current frame being decoded by the processor 202. Future reference frames 228 are frames that occur later in time than the current frame being decoded by the processor 202. A currently decoded frame 224 can depend on previous reference frames 226 and/or future reference frames 228.
In feature poor computing devices, the decoded reference frames 222 needed by the processor 202 to decode a current frame 224 may take several memory cycles to retrieve since the bandwidth of a memory system 204 in a feature poor computing device is generally limited. Further, due to the serial nature of decoding the coded data 220, different sets of decoded reference frames 222 may be retrieved for each coded video frame 220. This can result in the same decoded reference frames 222 being retrieved several times as the coded video data 220 is decoded.
As previously mentioned, feature rich computing devices typically possess much greater memory capacity and memory bandwidth in comparison to feature poor devices. Accordingly, these devices are able to access, retrieve and process data in quantities much larger than a single macroblock at a time. Theoretically, the performance of a feature rich computing device implementing a coding standard is much greater than what is currently achieved. Performance is limited due to the serial nature of decoding. In particular, performance is mainly limited by (a) the processor being idle when macroblock data is being read in from a memory device or I/O device because the processor needs prediction data contained in data yet to be retrieved; and (b) the memory device or I/O device being idle because the processor is busy decoding data that requires large amounts of processing.
Accordingly, what is needed is a processing system capable of exploiting the improved processor and memory capabilities of feature rich computing devices to more quickly decode video data compressed according to conventional coding techniques. In particular, the processing system should be capable of reading and decoding multiple chunks of video data (e.g., multiple macroblocks or multiple blocks of pixels) whenever possible.