A vertical filter is used in, for example, an MPEG-2 video decoder to scale the video picture and reduce height of the video picture. After each input video frame is written into a framestore memory the video frame is scaled to a smaller size using the vertical filter. The filter reads a number of input video display lines from a linestore. As each new output line is calculated, the filter needs some new input lines to be loaded into the linestore from the framestore memory. With the output picture at a quarter of the input size, the linestore loading requires four new input lines to be loaded from the framestore memory for each output line calculated.
Referring to FIG. 1, a block diagram of a circuit 10 illustrating a conventional MPEG-2 video display controller is shown. The circuit 10 includes a circuit 12 and a circuit 14. The circuit 12 is a post-processing display filter circuit. The circuit 14 is a filter controller circuit.
The circuit 12 includes a luma line buffer 20 that receives a 64-bit wide video data signal at an input and is serially connected to a 4-tap luma vertical filter 22. The luma vertical filter 22 receives an address signal that is presented by the filter controller 14. The luma vertical filter 22 is connected in series with a decimation filter 23. The circuit 12 also includes a chroma line buffer 26 that receives the 64-bit wide video data signal at an input and is serially connected to a 2-tap chroma vertical filter 28. The chroma vertical filter 28 receives an address signal that is presented by the filter controller 14. The chroma vertical filter 28 is connected in series with a decimation filter 29. The luma vertical filter 22 and the chroma vertical filter 28 present vertically scaled video display pixels (pels) to the 2:1 horizontal decimation filters 23 and 29. The horizontal decimation filters 23 and 29 present scaled pels to the luma buffer 24 and chroma buffer 30. The vertical filters 22 and 28 include finite impulse response (FIR) filters and multiply-accumulate cells (described below in connection with FIGS. 2 and 3, respectively). The horizontal filter 32 includes a horizontal interpolating filter and a phase accumulator (described below in connection with FIGS. 4 and 5, respectively).
The filter controller 14 receives video display control signals generated by an SDRAM controller and a host interface. The filter controller 14 includes an address generator 40 and display register 42. The circuit 10 can interpolate and reposition luma and chroma pels to improve picture quality. The circuit 10 can also perform vertical letterbox filtering in fixed 75% and 50% values. For horizontal filtering the display controller 10 includes two separate filters. These filters are the simple 2:1 decimation filters 23 and 29 using bilinear averaging and an 8-tap polyphase interpolation filter 32.
Referring to FIG. 2, a block diagram of a circuit 50 illustrating an exemplary 4-tap FIR filter is shown. The 4-tap FIR filter 50 is used for the luma vertical filter 22.
Referring to FIG. 3, a block diagram of a circuit 60 illustrating a multiply-accumulate cell and luma linestore circuit of the vertical filter 22 is shown. The filter area of the circuit 10 is reduced by implementing a single multiply-accumulate cell for each of the 4 lines to be filtered. The input line to the multiply-accumulate cell 60 is multiplexed to the multiplier. The multiplier accumulates each successive output with the result from the previous line. A 4-tap filter is implemented after 4 clock cycles.
The line buffer memory is 64 bits wide. The circuit 10 is improved by filtering the whole word at once (i.e., filtering 8 pels with the 8 multiply-accumulate cells 60). Filtering the whole word is implemented by writing words into the memory in an interleaved order and reading out each successive word containing 8 pels from the next required line. Circuitry similar to the circuit 60 is implemented for the chroma 2-tap FIR filter 28. In the chroma filter 28, the linestore is 192×64 bits and interleaves 2 lines for the filter taps.
The loading of the vertical filter linestores is controlled by separate state-machines for luma filter 22 and chroma filter 28. The state-machines directly control the decimation from 4 lines to 3 lines for 75% scaling or from 2 lines to 1 line for 50% scaling. The 2:1 horizontal decimation filters 23 and 29 are bilinear averaging filters. The horizontal filters average adjacent pels from the vertical filter circuits 22 and 28 (i.e., 8 pels input) to provide an output of 4 pels.
Referring to FIG. 4, a block diagram of a circuit 70 illustrating a horizontal interpolating filter section of the horizontal filter 32 is shown. The circuit 70 receives the pels presented by the luma buffer 24 and the chroma buffer 30. The interpolating filter circuit 70 is an 8-tap 8 phase polyphase FIR filter. The architecture of the circuit 70 is implemented using a Wallace Tree multiplier to reduce the design area. The Wallace Tree multiplier reduces design area by using shifts and add combinations to provide the multiplications in the filter taps. The circuit 70 design is compact. However, the circuit 70 has the disadvantages of (i) being fixed to two sets of coefficients, and (ii) the coefficients cannot be changed without a major redesign of the whole filter circuit 10. When the filter circuit 70 is disabled, the output is taken from the center tap position (i.e., position tap4).
Referring to FIG. 5, a block diagram of a circuit 80 illustrating a horizontal phase accumulator section of the horizontal filter circuit 32 is shown. The horizontal filter scaling is programmed by an 8-bit scale factor. The 8-bit scale factor is used with a phase accumulator 80 to determine which of the 8 phases to use in the filter taps. Separate phase accumulators 80 are included for luma (i.e., Y), and both chroma components (i.e., Cb and Cr). When the circuit 10 is scaling 1:1, the scale factor for the phase accumulators 80 is set to 256.
It would be desirable to have a video horizontal and vertical scaling filter with variable scaling, flexible scaling factors, and/or reduced memory bandwidth.