Currently, a processing unit within a chip can be composed of processor cores or be composed of hardware accelerators (accelerator for short). Improving processing capability of processor core by means of an accelerator is one of the trends in manufacturing high performance CPU. An accelerator can assist a processor core in processing some specialized tasks such as encryption, compression, etc. The adoption of an accelerator relieves the burden of a processor core which can mainly perform general purpose tasks that barely have any rule in structure. A chip with an accelerator generally has enhanced computing capability, because it not only possesses flexibility of general purpose processor, but also has a computing advantage of special purpose hardware.
Referring to FIG. 1A or FIG. 1B, a processing unit usually will broadcast a data request on a bus, and the requested data can be stored in a cache coupled to a processor core or merely be stored in memory. Normally, the requested data will be searched for in the cache first, and corresponding data will be read from relatively low-speed memory only if there is no desired data in the cache. Data provided in the memory will probably be simultaneously loaded in the cache so that subsequent read for same data will all be performed in the cache without having to access the memory again.
Referring to FIG. 2A, after the processing unit broadcasts a data request on the bus, both the cache and the memory will query its local storage and send a reply signal to the processing unit for telling whether it has any hit data block(s), and the processing unit will accordingly make preparation for receiving hit data block(s). The request for data and the reply thereto constitute a handshake. If there is a hit data block, one data block will be transmitted from the cache or memory to the processing unit after each handshake. The processing unit will then initiate a next round of handshakes after obtaining certain data block so as to request a next data block. In other words, the processing unit needs to initiate a handshake once at every request for data block, so sixteen handshakes have to be initiated for requesting sixteen data blocks.