Traditionally, integrated circuit processors are designed either as general purpose microprocessors or as application specific integrated circuits (ASIC's). The integrated circuit processors transfer data from memory through a tightly coupled memory interface. A general purpose microprocessor transfers data by following arbitrary sequences of microprocessor instructions defined by a user written program. This provides flexibility but decreases performance because the circuitry is not optimized for any specific application. An ASIC is designed by describing its structure in terms of circuit primitives such as Boolean gates and registers. The circuit designer arranges the circuit primitives so as to optimize performance for a specific application (such as video compression or audio decoding). While an ASIC provides high performance, its fixed architecture cannot be changed after fabrication to adapt to new algorithms or changing standards. Additionally, the high development costs and lengthy design cycle are not suited to rapidly developing markets.
The memory interface for a processing architecture can be advantageously designed for certain applications in which a large amount of ordered data is processed. These architectures are known as streaming architectures, and typically, the ordered data is stored in a regular memory pattern (such as a vector, a two-dimensional shape, or a link list) or transferred in real-time from a peripheral. Processing such ordered data streams is common in media applications, such as digital audio and video, and in data communication applications (such as data compression or decompression). In many applications, relatively little processing of each data item is required, but high computation rates are required because of the large amount of data. Processors and their associated memory interface are conventionally designed with complex circuits that attempt to dynamically predict the data access patterns and pre-fetch required data. This approach is typically limited in performance because data access patterns are difficult to predict correctly for many cases. In addition, the associated circuits consume power and chip area that can otherwise be allocated to actual data processing.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.