1. Introduction
The advent of enabling technologies allow integration of entire systems on silicon. This has lead to a proliferation of embedded systems in a variety of application domains that display different design constraints. Example of such design constraints include, but are not limited to, low cost, high performance and low power consumption. There is a simultaneous trend towards shorter system design cycles due to time-to-market pressures, and system customization to meet stringent cost, performance, and power constraints. Because of these demands, it is critical to develop system-level components and architectures that reduce design time while providing sufficient flexibility to be customized to the needs of a wide variety of applications.
Several dimensions must be considered while designing a single chip system that meets goals of performance, power consumption, cost and size. An essential requirement is to efficiently and optimally map an application's functionality to a set of high-performance components for computation and storage. These components include, but not limited to CPUs, DSPs, application specific cores, memories, custom logic, etc. However, the increasing number and heterogeneity of such components, combined with the large volume of data that they may need to exchange, necessitate a second, equally important requirement. That is the design should provide for a communication architecture that provides mechanisms for high speed on-chip (or on-circuit) communication between system components.
2. References
The following papers provide useful background information, for which they are incorporated herein by reference in their entirety, and are selectively referred to in the remainder of this disclosure by their accompanying reference numbers in square brackets (i.e., [3] for the third numbered paper by J. Turner and N. Yamanaka):    [1] “Peripheral Interconnect Bus Architecture.” http://www.omimo.be.    [2] “Sonics Integration Architecture, Sonics Inc.” http://www.sonicsinc.com.    [3] J. Turner and N. Yamanaka, “Architectural choices in large scale ATM switches,” IEICE Trans. on Communications, vol. E-81B, February 1998.    [4] “On chip bus attributes specification 1 OCB 1 1.0, On-chip bus DWG” http://www.vsi.org/library/specs/summary.htm.    [5] “Open Core Protocol Specification—version 1.0.”, http://www.sonics.com. October 1999.    [6] T. Yen and W. Wolf, “Communication synthesis for distributed embedded systems ,” in Proc. Int. Conf. Computer-Aided Design, pp. 288-294, November 1995.    [7] J. Daveau, T. B. Ismail, and A. A. Jerraya, “Synthesis of system-level communication by an allocation based approach in Proc. Int. Symp. System Level Synthesis, pp. 150-155. September 1995.    [8] M. Gasteier and M. Glesner, “Bus-based communication synthesis on system level ,” in ACM Trans. Design Automation Electronic Systems, pp. 1-11, January 1999.    [9] R. B. Ortega and G. Borriello, “Communication synthesis for distributed embedded systems ,” in Proc. Int. Conf. Computer-Aided Design, pp. 437-444, November 1998.    [10] K. Lahiri, C. Lakhshminarayana, A. Raghunathan, and S. Dey, “Communication Architecture Tuners: a methodology for the design of high performance communication architectures for system-on-chips,” in Proc. Design Automation Conf., June 2000.    [11] N. McKeown, M. Izzard, A. Mekkitikul, W. ellersick and M. Horowitz, “The Tiny Tera: A packet switch core,” IEEE Micro, vol. 17, pp. 26-33, January 1997.    [12] A. Smiljanic, “Flexible bandwidth allocation in terabit packet switches in Proc. of Intl. Conf. on Telecommunication, (Heidelberg, Germany), June 2000.    [13] M. Shreedhar and G. Varghese, “Efficient fair queueing using deficit round robin,” in Proc. of SIGCOMM, (Boston, Mass.), pp. 231-243, September 1995.    [14] L. Zhang, “Virtual clock: A new traffic control algorithm for packet switching networks,” in Proc. of SIGCOMM, 1990.    [15] H. Zhang, “Service disciplines for guaranteed performance service in packet-switching networks,” Proc. of IEEE, vol. 83, October 1995.    [16] A. C. Waldspurger and W. E. Weihl, “Lottery scheduling: Flexible proportional-share resource management,” in Proc. Symp. on Operating Systems Design and Implementation, (Monterey Calif. (USA)), pp. 1-12, 1994.    [17] D. Wingard and A. Kurosawa, “Integration architecture for system-on-a-chip design,” in Proc. Custom Integrated Circuits Conf., pp. 85-88, 1998.    [18] “IBM On-chip CoreConnect Bus Architecture.” http://www.chips.ibm.com/products/coreconnect/index.html.    [19] F. Balarin, M. Chiodo, H. Hsieh, A. Jureska, L. Lavagno, C.Passerone, A. Sangiovanni-Vincentelli, E. Sentovich, K. Suzuki and B. Tabbara. , Hardware-software Co-Design of Embedded Systems. The POLIS Approach. Kluwer Academic Publishers, Norwell, Mass., 1997.    [20] J. Buck and S. Ha and E. A. Lee and D. D. Masserchmitt, “Ptolemy: A framework for simulating and prototyping heterogeneous systems,” International Journal on Computer Simulation, Special Issue on Simulation Software Management, vol. 4, pp. 155-182, April 1994.
3. Related Work
Recognizing the importance of high-performance communication as a key to successful system design, recent work has addressed several issues pertaining to communication architectures, specifically for on-chip communication architectures.
One body of work addresses the development of on-chip integration and communication architectures. Several system design and semiconductor companies employ proprietary bus architectures. While many of them differ in their detailed implementation, they can be classified into a few categories, based on their structural topologies, and based on the protocols and policies they use to manage access to the shared bus. For example, bus architectures may be flat (single shared bus) or may consist of multiple buses grouped in a hierarchy (interconnected by bridges) in order to achieve higher degrees of parallelism in communication. The protocols commonly employed in SoC bus architectures include priority based arbitration [1], time division multiplexing [2], and token-ring mechanisms[3].
Another body of work is aimed at facilitating a plug-and-play design methodology for HW/SW SoCs by promoting the use of a consistent communication interface, so that predesigned components or cores can be easily integrated with other system components. Several on-chip bus standards are evolving to realize this goal, most notably that put forward by VSIA (Virtual Socket Interface Alliance) [4], and more recently, the Open Core Protocol made available by Sonics Inc [5]. Using standard interfaces is advantageous because (i) it frees the core developer from having to make any assumptions about the system in which the core will be used, (ii) paves the way for developing a variety of novel communication architectures not constrained by specific requirements of each SoC component that it needs to serve, and (iii) facilitates a plug-and-play design methodology, which give system designers access to a library of candidate cores from which to choose one that best suits the system's design goals.
Research on system-level synthesis of communication architectures [6, 7, 8, 9, 10] deals with synthesis of a custom communication architecture topology or protocols that are optimized for the specific application. These techniques typically assume an underlying architectural template that is customized to the specific application at hand.
It bears mentioning that some of the performance issues mentioned earlier have been studied in the networking and telecommunications literature, specifically in the context of traffic scheduling algorithms for switch fabrics [11, 12] and output queues [13, 14] of switches in high speed networks. A survey on scheduling techniques for output queued switches may be found in [15].
While the disclosed techniques have some relationship to work on packet switch architectures for large scale networks, previous research in that context cannot be directly applied to disclosed architectures in question, including system-on-chip design context at least due to the following reasons. Traffic scheduling algorithms need to take care of several issues at the same time, many of which may not be relevant for an application specific system-on-chip. Traffic scheduling algorithms can afford to be more complex, since the time available to make a scheduling decision is a cell time, as opposed to a bus cycle as in the case of on-chip communication; and hence, higher hardware implementation costs can be tolerated. For example, complex hardware techniques to profile the history of communication behavior are employed in determining the currently allocated bandwidth in many traffic scheduling techniques [14]. Also, traffic scheduling techniques are designed to be scalable in the number of flows or ports they can support, while for an SoC, typically the number of communication components or data flows are relatively small in number. This can lead to significantly different design decisions, such as choosing a centralized arbitration algorithm over a distributed one. Other relevant conventional techniques include those used in the context of scheduling multiple threads of computation in a multi-threaded operating system [16]. However, in that domain, while hardware implementation considerations are irrelevant, the software architecture needs to provide for security and insulation between competing applications.