1. Field of the Invention
The invention relates to a stream processing system, more particularly to a stream processing system having a reconfigurable memory module.
2. Description of the Related Art
FIG. 1 illustrates a conventional pipelined stream processing system 10 that includes a number (N) of stream processing units 11, and a number (N+1) of first-in first-out (FIFO) stream fetching units 13. However, the conventional pipelined stream processing system 10 requires a large external bandwidth and may encounter pipeline unbalance, thereby adversely affecting performance and hardware utilization thereof.
FIG. 2 illustrates a graphic application embodied in the conventional pipelined stream processing system 10, wherein N=3 and the stream processing units 11 are used to perform geometry stream processing, geometry-to-pixel processing and pixel stream processing, respectively. It is noted that, when an input speed of vertex data to an input vertex buffer 20 is slower than a processing speed of the stream processing unit 11 for geometry stream processing, the whole system 10 is idle to wait for stream data feeding. In addition, two stream processing units 11 are required for processing from a geometry stage to a pixel stage. Therefore, it is difficult to find optimized stream fetching between the geometry stage and the pixel stage.
A conventional vertex cache has been proposed to reduce a memory bandwidth of a 3D graphic processor, wherein a pre-TnL cache can prevent transfer of extra vertex data when the latter has already been stored therein, and processed vertex results can be reused in a post-TnL cache. The pre-TnL cache needs to prefetch a number of consecutive vertex data using a burst mode. Conventionally, the pre-TnL cache organizes 32 entries into 8 slots replaced by new data in a FIFO manner upon cache miss. The post-TnL cache holds 16 entries, which are divided into 4 slots replaced in the FIFO manner. Both the pre-TnL and post-TnL caches use a 16-bit index to identify whether corresponding data is fetched or processed. The size of input/output vertex data can be changed as required. However, each buffer in the conventional vertex cache is designed to be dedicated, and the memory size of the same is determined based on the worst case. As a result, the conventional vertex cache may result in huge waste in memory space when used in other applications.