A typical computer system 100, such as that illustrated in FIG. 1, includes a processor 102, a memory controller 104, and main memory storage 106. Main memory storage 106 includes one or more memory chips, such as Dynamic Random Access Memory (DRAM) chips.
In order for the processor 102 to obtain data from main memory storage 106, the processor 102 sends a data request to the memory controller 104 over a communications bus 108. Memory controller 104 processes and reformats the request, and sends one or more reformatted request messages to main memory storage 106 over a main memory storage bus 110. Main memory storage 106 then returns the requested data to memory controller 104 over the main memory storage bus 110. After receiving the requested data, memory controller 104 then sends the data to processor 102 over the data communications bus 108.
The information and data associated with a particular request is often referred to as a “transaction.” At times, memory controller 104 could be processing multiple transactions simultaneously. This can result in a situation where data from multiple sources (e.g., multiple DRAMs within main memory storage 106) are simultaneously available to be returned from main memory storage 106 to the memory controller 104 over the main memory storage bus 110. When this occurs, memory controller 104 performs an arbitration process to determine which source (e.g., which DRAM) will be granted access to the main memory storage bus 110.
Once access is granted, main memory storage 106 places the data associated with a transaction on the main memory storage bus 110 for one or more bus clock cycles, depending on the size of the transaction and the width of the main memory storage bus 110 (e.g., the number of parallel bits). For example, if a transaction includes 52 data bits, and the bus width is 32 bits, two clock cycles would be necessary to transfer the data on the bus 110. Assuming, for simplicity, that no header information is included, the first 32 bits could be transferred during a first clock cycle, and the last 20 bits could be transferred during a second clock cycle.
The above example illustrates that, during the last clock cycle in which a transaction's data is being transferred on the main memory storage bus 110, the bus 110 often is not completely filled. In the present example, only 20 of the 32 available bits are filled during the second clock cycle, leaving 12 bits empty. In current systems, if the main memory storage bus 110 will be granted to another source (e.g., another DRAM) upon the completion of the transaction, these 12 bits would be left empty, and the data for the next transaction would start on the next clock cycle.
The example illustrates that gaps inherently exist on the main memory storage bus 110, using prior art techniques. These gaps result in increased system latency and decreased bandwidth. Accordingly, what are needed are methods and apparatuses that more efficiently assemble data from multiple sources for transmission on a bus.