During the compression of video data, for example using the MPEG2, MPEG4 or H.263 video compression standards, a video data compressor stores and retrieves current video frame data and reconstructed, i.e. previous, video frame data to and from an external memory device. In one example, the external device is referred to as a “frame memory”, and takes the form of an SDRAM device. Transfer of data from and to the external frame memory in this manner consumes a relatively large amount of power in mobile systems.
A conventional video data compressor system 2 is shown in the block diagram of FIG. 1. As an input, the system 2 receives an input image frame in the form of data, referred to herein as a “current frame” or “current video data”. The current frame is stored in a frame memory 4 unit.
The system 2 processes video frames according to the mode of operation. When a drastic change between two sequential images is detected, the system operates in an “intra-mode”. When in intra-mode, the operation of motion compensation is not performed. When a subtle change between two sequential images is detected, the system operates in an “inter-mode”. When in inter-mode, the operations of motion compensation and motion estimation are performed.
Assuming the inter-mode of operation, a motion estimation block ME 28 compares the current frame stored in frame memory 4 to a reconstructed previous video frame 27a, referred to herein as a “reference video frame”, also stored in frame memory 24, and, as a result of the comparison, generates and outputs a motion vector 29 to a motion compensation block 26. The motion compensation block 26 applies the motion vector 29 to the reference frame 27b and generates a compensated video frame 25. A subtraction circuit 6 calculates the difference in value between the current video frame stored in frame memory 4 and the compensated video frame 25. The difference is applied to a discrete cosine transform circuit DCT 8, where it is converted from the spatial domain to the frequency domain, and the output of the DCT 8 is quantized at quantization block Q 10. The quantized output 11 is coded at a variable length coding circuit VLC 14 in order to statistically reduce the amount of output data. The coded bit stream output from the VLC 14 is stored in an output buffer FIFO 16, from which it is output as an output stream to a receiving apparatus or channel. Rate control circuit 12 provides a quantization rate control signal to the quantization block Q 10 that is applied for the quantization of the following video frame based on the number of the bit streams in the FIFO 16, in order to prevent the FIFO 16 from overflow or underflow.
At the same time, the quantized output 11 of the quantization block Q 10 enters a decoding procedure. The quantized output 11, in the form of quantized coefficients, are inversely quantized at an inverse quantization block IQ 18 and inverse discrete cosine transformed at an inverse discrete cosine transform block IDCT 20, and thus converted back to the spatial domain. The output 21 of the IDCT 20 takes the form of differential image signals having a quantized loss between the current video frame and the reference video frame. The output 21 is added to the compensated video frame 25 at a composer 22. The composer 22 output, i.e. the reference video frame, 27a, 27b, is stored in the frame memory 24. The reference video frame is used for the compression of the next received current video frame.
Note that while the above description refers to the compression of video data in the form of video “frames”, the systems described herein apply equally well to entire frames of video data as well as segments, or “blocks” or “macro blocks” of video frames. The terms “video data”, and “video frames” as used herein, are therefore applicable to, and include, both entire frames of data or segments, blocks, or macro blocks of video data frames.
As an example, the motion estimator ME 28, in its operation to determine the best match of the current frame with the previous frame operates exclusively on the luminance macro block of the video frame. The motion compensation function MC 26 operates on the luminance macro block and chrominance macro block of the video frame.
FIG. 2 illustrates a conventional mobile system 30 including a conventional video data compressor 40. The video data compressor 40 is constructed in a single integrated circuit, referred to as a system on a chip (SOC) circuit. The video data compressor 40 comprises a central processing unit CPU 42, a memory controller 44, a motion estimation/compensation unit ME/MC 46, and a discrete cosine transform/quantization unit DCT/Q 48. The respective units 42, 44, 46, 48 are each connected to a local bus 49. Each of the processing units 42,46,48 sends data to, and retrieves data from, an external frame memory SDRAM 32. The data exchange is controlled by a memory controller 44 that is connected to the local bus 49 and is under the control of the CPU 42.
A conventional design for the video data compressor 40 in the mobile system commonly takes the form of hardwired circuits and software programs functioning on an operating system. For example, referring back to FIG. 1, the function of the rate control circuit 12 and the VLC 14 of FIG. 1 can be performed by a software program hosted on the CPU 42, while the function of the ME/MC 46 and the DCT/Q 48 of FIG. 2 can be constructed of specialized hardwired circuits.
The operational frequency of the local bus 49 in the mobile system 30 is determined according to the memory bandwidth required by each of the various processing units 42, 46, 48, wherein the memory bandwidth refers to the amount of bus time required in bits per second for each of the units 42, 46, 48 to communicate with memory 32, and further by the operational frequency of the CPU 42. The power consumption of the video data compressor 30 is in turn a function of the operational frequency of the local bus 49.
The conventional mobile system 30 includes an external frame memory SDRAM 32 connected to the local bus 49. One way to reduce power consumption in the frame memory is to embed the frame memory into the circuit of the video data compressor as a single integrated circuit; however, it is difficult to integrate such a large amount of memory into a single circuit. Since each processing unit 46, 48 exchanges data with the external frame memory SDRAM 32 via the local bus 49, the operational frequency of the local bus 49 is necessarily high in the video data compressor 40.
Table 1 shows the number of memory bytes located in external memory that are accessed by each processing block. In this example, a search window for a motion vector is assumed to be fcode=1(−16˜+15.5). The fcode parameter is defined in the MPEG standard for motion compensation and defines the maximum size of the search range.
TABLE 1Processing blockFunction requiring video dataAmount of data (bytes)Motion Estimation(1) Current macro block read (1)16 × 16 = 256(ME)(2) Search window read (2)48 × 48 = 2304Motion(3) Current Cb block read (3) 8 × 8 = 64Compensation (MC)(4) Current Cr block read (4) 8 × 8 = 64(5) Previous Cb block read (5) 9 × 9 = 81(6) Previous Cr block read (6) 9 × 9 = 81(7) Motion compensated macro block write 8 × 8 × 6 = 384Discrete Cosine(8) Motion compensated macro block read 8 × 8 × 6 = 384Transform (DCT)(9) Quantized coefficient write 8 × 8 × 6 × 1.5 = 576Inverse Quantization/(10) Quantized coefficient read 8 × 8 × 6 × 1.5 = 576Inverse DCT(11) Reconstructed error image write 8 × 8 × 6 = 384Motion(12) Previous Y blocks read17 × 17 = 289Compensation (MC)(13) Previous Cb blocks read 9 × 9 = 81(14) Previous Cr blocks read 9 × 9 = 81Reconstruction(15) Reconstructed error image read 8 × 8 × 6 = 384(16) Reconstructed image write (16) 8 × 8 × 6 = 384Total6373
Each data frame includes a number of macro blocks, and each macro block includes 2×2 luminance blocks Y, each luminance block Y comprising 8×8 pixels, and two chrominance blocks, i.e., one is for chrominance blue Cb and the other is for chrominance red Cr. Each chrominance block comprises 8×8 pixels.
When motion estimation is performed by the motion estimation ME unit 46 of FIG. 2, only the luminance blocks are used, therefore the amount of the data read from memory 32 during the retrieval of a current luminance macro block is 256 bytes, i.e., 16*16=256, as shown in step (1) of Table 1. At step (2), the search window for the motion vector is next read from the memory 32, and the amount of data read is 48*48=2304 bytes (assuming fcode=1).
After the motion vector is determined by the motion estimation ME unit, the motion compensation unit MC reads from memory 32 the two previous blocks (chrominance blue Cb and chrominance red Cr) which are best matched with the current blocks, each read block of the previous blocks including 9*9=81 bytes of pixel data, as shown in steps (5) and (6) of Table 1. In addition, the current chrominance block blue Cb and current chrominance block red Cr are also read from memory 32, each including 8*8=64 bytes of data, as shown in steps (3) and (4) of Table 1. A difference macro block (referred to as the “motion compensated macro block”) between the current macro block (4 blocks for luminance and 2 blocks for chrominance) and previous macro block (4 blocks for luminance and 2 blocks for chrominance) is then computed by the subtraction circuit 6 (see FIG. 1) and is written to memory 32, as 8*8*6=384 bytes of data, as shown in step (7) of Table 1.
Following computation of the difference macro block, the DCT/quantization unit 48 reads the motion compensated macro block from memory 32 as 8*8*6=384 bytes of data, as shown in step (8) of Table 1, and performs transformation and quantization of the data, as explained above. Following the DCT operation, the amount of data, or data bandwidth, increases by one and one-half times, for example, if the input data is 8 bits wide, then the output data is 12 bits wide. The output of the DCT are quantized by the quantization Q unit (see unit 10 of FIG. 1), and the quantized coefficients are written to memory 32 as 8*8*6*1.5=576 bytes of data, as shown in step (9) of Table 1.
In addition, generation of the reference macro block for the next frame image is required. Accordingly, the IQ/IDCT unit (see units 18 and 20 of FIG. 1), reads the quantized coefficients from memory 32 as 8*8*6*1.5=576 bytes of data, as shown in step (10) of Table 1, and reconstructs a difference macro block. The reconstructed difference macro block is stored in memory 32 as 8*8*6=384 bytes of data, as shown in step (11) of Table 1.
The motion compensation MC unit 46 (see also unit 26 of FIG. 1) next reads from memory 32 the previous macro block from memory 32, the previous macro block including two luminance blocks of 17*17=289 bytes of data, as shown in step (12) of Table 1, and two chrominance blocks, each of 9*9=81 bytes of data, as shown in steps (13) and (14) of Table 1. The previous macro block is added to the reconstructed error image macro block, which is read from memory 32 as 8*8*6 bytes of data, as shown in step (15) of Table 1. The reconstructed image macro block, which is used as a “previous” block for the following frame, is then stored in memory 32 as shown in step (16) of Table 1.
As described above, the conventional video compressor relies heavily on the common local bus 49 and the external frame memory 32 and requires, in inter-mode operation, two motion compensation processes per iteration; one for data compression and the other for reconstruction. The required operational frequency of the local bus 49 is therefore high, because all procedures for data compression are performed in a pipelined system, with each step consuming local bus 49 bandwidth, as shown above in Table 1. The amount of power consumed by the frequent reading from and writing to external memory as shown in this example, is not suitable for efficient operation in mobile systems.