1. Field of the Invention
The invention relates generally to the field of processor chips and specifically to the field of single-instruction multiple-data (SIMD) processors. More particularly, the present invention relates to vector Look-Up Table (LUT) and histogram operations in a SIMD processor.
2. Description of the Background Art
Histogram calculation for video and images has typically been performed as a scalar operation. For example, Imaging and Compression Engine (ICE) chip used by Silicon Graphics O2 workstation featured a 8-wide SIMD architecture was used to implement video and image library operations, but histogram was calculated one component at a time, even though ICE has a dual-instruction issue per-clock-cycle for concurrent I/O and ability to process 8 data in parallel. This resulted in the order of N*M clock cycles to calculate the histogram of N data entries with M components, where components are different colors such as Red, Green, Blue, and Alpha (RGBA).
It is also conceivable that image and video data could be partitioned into K groups, and each of these groups could be passed on to a different processor, or to a part of VLIW processor for concurrent calculation. For example, TI's TMS320C62X Image/Video Processing Library uses this technique, where this VLIW processor has 8 separate execution units. This code operates on four interleaved histogram bins, which are later summed together. The benchmark according to this library is 1158 cycles for L=512 data entries, or (9/8)*L+512 in general. This result is no better than order of L operations, despite 8-wide parallel VLIW architecture of 'C62X.
Image and video analysis, histogram equalization, auto focus, and other image enhancement algorithms use histogram calculation. If image has multiple components such as RGBA or YUV (luma and two chroma), different histogram bins are used for each component.
Part of the reason histogram has been limited to a scalar operation is that there was no support for its parallel implementation in a SIMD or VLIW processor.