This disclosure generally relates to processor operation handling, and more specifically relates to a system and method for reducing latency in computer processors related to delivery of data from processor to the devices or agents in the system.
In a typical processor core, the largest store available is a sixteen byte store. Therefore, it requires it takes eight stores to push a 128-byte line from the processor to the memory mapped input/output (“MMIO”) space of an input/out (“IO”) card. A noncacheable unit (“NCU”) store gathers these stores so that a full line is output on the main bus to be delivered to the IO card thru the processor host bridge. These operations add to the latency for noncacheable operations because the NCU must wait for all the stores to complete before pushing the line to the MMIO space of the IO card. These operations require data to be transferred within the processor through caches and processor registers before being sent to the NCU for execution and output to the IO card.