The invention relates to a device to spatially and temporally reorder data transmitted between a processor, memory and peripheral devices. In particular, a simple, efficient and fast device to order data for processor manipulation in linear and critical chunk format taking into consideration temporal and spatial considerations is provided.
Microprocessor performance has seen incredible increases over the short history of computers. With this increase in processor performance, seen in the increased number of processor cycles per second, has come the need for a comparable increase in access speed to data and instructions. Otherwise, it provides little benefit to have a very fast processor if it is spending most of its time waiting for retrieval of data and instructions from memory. One method used to improve access speed to data and instructions is using cache memory which cycles at the same speed as the processor. However, cache memory is expensive and the amount available to a processor is thus limited. Therefore, a need exists to facilitate memory access to data and instructions.
In order to overcome this problem, computer manufactures have employed separate devices or chips to handle memory addressing, access, transfer, and retrieval when requested by a processor or other device. The use of these devices has improved performance since they are specifically designed to handle only memory access, but all too often they have proven to be complex, difficult to implement and still relatively slow. Therefore, in some cases these devices actually form a bottleneck to maximum processor utilization. For example, when a read operation is executed the processor may specify a specific order in which data is to presented to it. Prior methods and devices have often not attempted any reordering of data until all data in a cache line is received. Therefore, a processor will frequently have to wait for arrival of an entire cache line of data before any attempt is made to process the data. Of course, this would slow processing for an otherwise very fast processor. In addition, in prior approaches, when reordering was done it was done on a single cache line basis of 64 bits or less.
Further, processors and other input/output (I/O) devices may have specific requirements as to how data is to be ordered for presentation. Any device that accesses memory at the request of a processor or other I/O device must be able to translate from one form of desired presentation to another while still being able to keep latency and space used on the chip to a minimum and throughput to a maximum without unduly increasing the complexity of the logic required.
Therefore, what is needed is a device to transmit and receive data between a processor, memory and peripherals that can handle temporal and spatial reordering of data in a quick, efficient, simple manner that minimizes the logic and space required on a chipset. This device should be able to simultaneously manipulate as much data as possible to maximize processor efficiency and do so for both read and write operations. Using this method and device the space and power in the chipset as well as the heat generated is reduced while improving processor performance and throughput.