In classical bus-based architectures, communications between on-chip cores use a blocking protocol. Specifically, while a transfer is underway between an initiator and a target, the bus resources are not available for any other transfers to occur.
Additionally, most of the conventional bus architectures mainly support burst transactions with incrementing address sequences. Many applications naturally require support of the bursts with non-incrementing address sequences for improved efficiency and reduced latency. A series of one or more read requests may be in a burst request.
For example, cache replacement procedures in various computational devices including CPU's, Digital Signal Processor's, and Media Processors are often optimized to fetch the critical word first (CWF) from the memory sub-system. Common algorithms for implementing CWF supported by many CPU's and DRAM devices are WRAP and XOR bursts.
Digital Multimedia applications, which include video compression and decompression, graphics rendering, image manipulation and rasterization, are characterized by extensive access to shared memory devices, requiring high bandwidth and efficiency in the on-chip communication fabric connecting the application IP cores to the memory system. They are also often characterized by organizing their data in memory as rectangular blocks that are processed by an applications as a single unit.
For example, MPEG-2 compression and decompression algorithms operate on rectangular units called macro-blocks. Modern graphics devices displaying text manipulate on rectangular units called fonts. Communicating these blocks across traditional bus structures as a series of multiple independent short incrementing bursts can detrimentally impact system performance due to the high overhead of request bandwidth and undetermined temporal proximity of accesses to the data blocks that may be stored in the spatial proximity of the same page of the memory device.