1. Field of the Invention
The present invention relates to interconnects between a processor and memory in a computer system, and more particularly, to a system and method providing reduced energy consumption by interconnects between a processor and memory in a computer system.
2. Brief Description of the Related Art
Modern embedded networking, video, and image processing systems are typically implemented as systems-on-a-chip (SoC) in order to reduce manufacturing costs and overall power and energy consumption. By integrating all of the peripheral functionality directly onto the same chip with the core microprocessor, both chip manufacturing and system integration costs can be lowered dramatically. In addition to cost, managing power and energy is a first order constraint that drives the design of embedded systems based on SoCs. However, most modern SoC-based embedded systems require more memory capacity than can reasonably be embedded into a single core. In such systems, the interconnects between the processor and external memory can consume as much or more power than the core itself. Even though the external memory and its associated interconnect are major contributors to the overall power dissipation in SoC-based embedded systems, such systems will continue to require the memory capacity afforded by external memory into the foreseeable future. Therefore, it is essential to develop advanced memory controller architectures that reduce the power dissipation of external memories and the interconnects between the embedded SoC core and the external memories.
Encoding data that is stored in memory can minimize the power consumed by the processor-memory interconnect. Dynamic power is consumed by the interconnect drivers when there are bit transitions. To minimize this power consumption, double-ended, context-dependent codes such as the bus-invert code have previously been proposed. Double-ended codes encode data at the transmitter and decode it at the receiver. For a processor-memory interconnect, this implies that the SDRAM also needs to participate in such codes. Context-dependent codes use the value last transmitted on the interconnect as well as the current data in order to encode the data to minimize transitions. For example, bus-invert coding either transmits the data value unchanged or its inverse, depending on which value minimizes transitions on the interconnect. If the SDRAM was modified to support such coding, bus-invert coding could reduce transitions on the interconnect by 22% on average.
The architecture of a modern SoC-based embedded system is presented in FIG. 1. The SoC core 110 has one or more simple processors 112 designed to provide enough computational capability for the application and integrated with some embedded memory, a variety of on-chip peripherals 114 for data acquisition and connectivity, a scratch pad 116 and memory controller 120. These systems also integrate SDRAM, since they frequently require more memory capacity—to buffer large data streams before either processing or forwarding them—than can reasonably be embedded into the SoC core.
Managing power dissipation and providing sufficient on-chip memory capacity are two major challenges in the design of such SoC-based embedded systems. The International Technology Roadmap for Semiconductors (ITRS) predicts that without significant architectural and design technology improvements, the power consumption of both high performance and low power SoC-based embedded systems will grow exponentially, easily exceeding power budgets (Edenfeld, D., Khang, A. B., Rodgers, M., Zorian, Y., “2003 technology roadmap for semiconductors,” IEEE Computer (2004), International technology roadmap for semiconductors (2003)). Tethered embedded systems frequently have limited power budgets because of constraints on power delivery and cooling area available on peripheral buses. Mobile systems, in addition to requiring low power dissipation, are also constrained by battery life making energy consumption an important factor.
Currently, the power dissipation of representative low power and high performance embedded systems is divided roughly equally among the SoC 110, the memory interconnect 130, and the external memory 140. Furthermore, while high performance embedded systems can dissipate an order of magnitude more power than low power systems, the relative power dissipation of the SoC core 110, the interconnect 130, and the memory 140 remains similar. It is clear that in such systems, the external memory 140 and interconnect 130 can dissipate as much or more power than the SoC core 110. Thus, the memory system and the interconnect are candidates for techniques to reduce and manage power and energy.
Dynamic power is dissipated on a signal line of a bus whenever there is a transition on that line. A signal transition causes the drivers to actively change the value of the bus, which acts as a large capacitance. It is also possible for the drivers to dissipate static power when they hold the bus at either logical 0 or logical 1, depending on the design of the drivers. It is possible to limit this leakage power for the low frequencies of operation commonly found in embedded systems, so static power dissipation is typically dwarfed by the dynamic power dissipation of the bus drivers. However, there are still situations in which static power dissipation cannot be ignored, including higher frequencies of operation and when there are voltage mismatches between the core and the memory.
The techniques used to reduce power dissipation in external memory systems fall roughly into three categories: low-power memory modes, external memory access reduction, and double-ended techniques. Most modern commodity memories have one or more low power modes of operation. It may be expensive to enter and exit these modes, but frequently the memory dissipates an order of magnitude less power when it is in these modes. Several techniques, such as those proposed in Delaluz, V., Kandemir, M., Vijaykrishnan, N., Sivasubramaniam, A., Irwin, M. “DRAM energy management using software and hardware directed power mode control,” Proceedings of the International Symposium on High-Performance Computer Architecture (2001) and Fan, X., Ellis, C., Lebeck, A., “Memory controller policies for dram power management,” Proceedings of the International Symposium on Low Power Electronics and Design (2001), can be used to determine when external memory should be powered down to minimize power dissipation without disrupting performance. Another way to reduce the power dissipation of external memories is to access them less frequently. These techniques use some combination of on-chip memory, caching, and code reorganization to allow the processing core to reduce the number of external memory accesses. See Catthoor, F., Wuytack, S., DeGreef, E., Balasa, F., Nachtergaele, L., Vandecappelle, A., “Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design,” Kluwer Academic Publishers (1998); Kulkarni, C., Catthoor, F., DeMan, H., “Code transformations for low power caching in embedded multimedia processors,” Proceedings of the International Parallel Processing Symposium (1998); Kulkarni, C., Miranda, M., Ghez, C., Catthoor, F., Man, H. D., “Cache conscious data layout organization for embedded multimedia applications,” Proceedings of the Design and Test in Europe Conference. (2001); and Panda, P. R., Dutt, N. D., Nicolau, A., “On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems,” ACM Transactions on Design Automation of Electronic Systems 5 (2000) 682-704. In turn, this reduces the power demands of the external memory when it is active and can also allow it to be put to sleep more frequently. The final set of techniques for reducing the power dissipation of external memories requires cooperation between the memory controller 120 and the memory 140. These techniques either encode data to minimize power dissipation across the interconnect or transmit additional information to the memory to enable it to access the memory array more efficiently. See Benini, L., Macii, A., Poncino, M., Scarsi, R., “Architectures and synthesis algorithms for power-efficient bus interfaces,” IEEE Trans. Computer-aided Design, vol. 19, No. 9, pp. 969-980 (2000); Ramprasad, S., Shanbag, N. R., Hajj, I. N., “A coding framework for low power address and data buses,” IEEE Transactions on VLSI Systems vol. 7 pp. 212-221 (June 1999); Sotiriadis, P., Chandrakasan, A., “A bus energy model for deep sub-micron technology,” IEEE Transactions on VLSI Systems vol. 10 pp. 341-350 (2002); Sotiriadis, P., Tarokh, V., Chandrakasan, A. P., “Energy reduction in VLSI computation modules: An information-theoretic approach,” IEEE Transactions on Information Theory, vol. 49, pp. 790-808; and Stan, M. R. and Burleson, W. P., “Low-power encodings for global communication in CMOS VLSI,” IEEE Transactions on VLSI Systems, vol. 5, no. 4 (1997)).
The majority of data encoding schemes proposed in literature are not applicable to the off-chip interconnect 130 between an SoC 110 and external memory 140 because they are double-ended, context-dependent codes. Double-ended codes require collaboration between the transmitter and receiver to transfer encoded data. In such state-of-the-art codes, the transmitter (i.e., the memory controller on the SoC) uses a potentially complex handshaking protocol to communicate with the receiver (i.e., a decoder in the memory), which has the ability to interpret these handshakes to decode the transmitted data. The roles of the coder and the decoder would be reversed when communicating in the opposite direction (i.e., a memory read). Thus, as shown in FIG. 2, a potentially complex codec (coder-decoder) has to be present on both ends to successfully use these schemes. However, commodity SDRAMs do not have built-in codecs that are capable of communicating with the SoC core in this fashion.
Context-dependent coding schemes rely on inter-symbol correlation on successive data transfers to reduce power consumption. However, such schemes are not effective with commodity memory, as the memory cannot participate in the scheme. Therefore, any coding scheme using commodity memory must be able to unambiguously decode data read from the memory that was encoded when it was written to the memory. If inter-symbol correlation information is used when writing the data, then that information is not available upon reading the data, since there is no guarantee that data will be read in exactly the same order it is written. Some context-dependent coding schemes, such as those that use an XOR decorrelator, do not include enough information in the codeword to unambiguously recover the original data without the context information. However, other context-dependent schemes, such as bus-invert coding, produce code-words that can be decoded without context information. Even then, such schemes will only minimize power when writing to the memory, as the data will be read in a different context than it was written. Therefore, context-dependent codes are almost exclusively used in situations where both the transmitter and receiver can participate in the code. That way, the context information can be used to decode the transferred data before it is stored. If the data is retrieved later, it is re-encoded and transferred based on the context at that time.
The most popular and easy-to-implement double-ended context-dependent code reported in literature in the bus-invert code. See, Stan, M. R. and Burleson, W. P., “Bus invert coding for low power I/O,” IEEE Transactions on VLSI Systems vol. 3, no. 1 pp. 49-58 (1995). The bus-invert code is a context-dependent, double-ended code since it computes the Hamming distance between the currently encoded data value on the bus and the next data value. If the Hamming distance exceeds
      ⌈          n      2        ⌉    ,then the transmitter inverts the next value transmitted on the bus.
An additional line on the bus indicates whether the data is inverted or not, allowing the receiver to unambiguously decode the transmitted data. In this manner, an n-bit value can be transmitted over an n+1-bit bus with at most
  ⌈            n      +      1        2    ⌉transitions. Without such coding, an n-bit value could cause as many as n transitions over an n-bit bus. For example, if the current value on the bus is 0000, and the next value to be transferred is 0001, then the Hamming distance between the values is 1. Therefore, 0001 will be transmitted over the bus with the invert bit set to 0, indicating the data is not inverted. However, if the current value on the bus is 1111 instead, the Hamming distance between the values is 3 and hence 1110 is transmitted with the invert bit set to 1. In this manner, each information symbol in the n-bit input space maps to two codewords. The code-word that minimizes switching activity on the interconnect is chosen for transmission to reduce power consumption. Bus-invert is thus not a one-to-one mapping, i.e., it is not a context-independent code. The bus-invert codewords for all the information symbols on a 4-bit wide data bus are shown in column 2 of Table 1.
TABLE IComparison of Different CodesInformationBus-invert2-LWCFreq.-basedFreq.-basedFreq. valueSymbolsCoding [14]Code [13]Frequency (%)remapping2-LWCcoding [17]00000 00000 00006.710010 01100 00001 111100010 00010 00015.601010 10100 00011 111000100 00100 00104.710111 10000 00101 110100110 00110 00116.910100 00110 00111 110001000 01000 01007.601000 01001 10001 101101010 01010 01017.011001 00000 01011 101001100 01100 01104.011011 00100 01101 100101110 01111 10008.100100 00101 01001 100010000 10000 10004.811100 10010 10001 011110010 10010 10018.400010 00011 00101 011010100 10100 10105.900110 11000 10101 010110110 10111 01004.001111 01000 10111 010011000 11000 11006.601100.01010 11001 001111010 11011 00108.500000 00001 00011 001011100 11101 00013.711111 00010 11101 000111110 11111 00007.510000 10000 11111 0000
Many other context-dependent, double-ended codes have been proposed. One such code is based on the use of a decorrelator, which XOR's the data to be transmitted with the previous value transmitted across the bus. See Benini, L., DeMicheli, G., Macii, E., Sciuto, D. and Silvano, C., “Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems,” Proceedings of the Great Lakes Symposium on VLSI, pp. 77-82 (1997) and Musoll, E., Lang, T. and Cortadella, J., “Exploiting locality of memory references to reduce the address bus energy,” Proceedings of the International Symposium on Low Power Electronics Design, pp. 202-207 (1997). The receiver must then recover the actual value by undoing the XOR operation. Further reductions can be achieved by exploiting information about the frequency of occurrence of particular data values on the bus. In Yang, J., Gupta, R. and Zhang, C., “Frequent value encoding for low power data buses,” ACM Trans. Des. Automation Electronic Systems, vol. 9, 354-384 (2004), a decorrelator was combined with a one-hot encoding of the 32 most frequently occurring values. Like bus-invert, such frequent value encoding is still a context-dependent, double-ended code because of the use of the decorrelator. The transmitter first decides if the data value is one of the most 32 frequently occurring values. If so, it is one-hot encoded. A one-hot code on a n-bit wide bus is a coding scheme where exactly one out of n bits is set to one. At the word-level, 32 codewords are available and hence 32 frequently occurring values can be encoded leaving the remaining values unencoded. (The reported code also ignores values 1-16 and performed equality tests before transmission, the details of which are excluded for brevity. Nevertheless, the table reflects the best scheme reported that includes some of these features.) Note that an additional bit is needed to indicate whether or not the data is one-hot encoded to the receiver. The result of one-hot encoding is then passed through the decorrelator prior to transmission across the bus. The receiver must recover the actual value by undoing the XOR and one-hot encoding transformations. The final column of Table 1 shows the one-hot codeword assignments, based on the frequency distribution in column 4, for frequent value coding on a 4-bit wide data bus. Note that only 4 most frequently occurring values (1101, 1001, 0111, 0100) are one-hot encoded. In practice, these values would also be XOR'ed with the previous value transmitted across the bus by the decorrelator. Like bus-invert, frequent value encoding does not use a one-to-one mapping, as a particular data value can be mapped to many encoded values depending on the previous data transmitted across the bus.