1. Field of the Invention
The present invention generally relates to a circuit design method and apparatus for increasing communication efficiency between processing elements connected to a network-on-chip in a system-on-chip and reducing power consumption of each element, and in particular, to an apparatus and its control method for varying dynamic frequencies of processing elements according to the communication pattern and congestion between processing elements connected to a network-on-chip in a system-on-chip.
2. Description of the Related Art
Due to the convergence and thin-and-light trend of Information Technology (IT) devices, System-On-Chip (SOC) technology is developing, in which various high-performance IT devices are integrated into one chip. Among the technologies for realizing SOC, the bus system for connecting several processing elements and enabling their mutual communication has increased in importance. However, with the increase in system integration and the rapid increase in the amount of information exchanged between processing elements, the conventional shared bus structure has decreased in utilization in the high-performance SOC due to the limit of bandwidth. To address the limit of bandwidth and facilitate design of a high-integration high-performance SOC, a Network-On-Chip (NOC) technology has been introduced.
The NOC technology is the next generation on-chip bus technology provided by applying packet or circuit network technology between the general computers or communication devices to a communication structure between processing elements of SOC.
FIG. 1 illustrates a structure of a network-on-chip. Referring to FIG. 1, a NOC 200 generally includes Network Interfaces (NIs) 210 for connecting a plurality of Processing Elements (PEs) 100 connected to a network-on-chip, switches (or routers) 220, and bidirectional links 230 for connecting between the NIs 210 and the switches 220, or among different switches. The topology between the PEs and the switches, or among different switches, is dynamically designed according to an application.
FIG. 2 illustrates a shared bus structure. Referring to FIG. 2, in the conventional shared bus structure, the desired transmission data is initially delivered to all PEs 310 connected to a shared bus 330, and only one corresponding PE among the delivered PEs, or a destination of the corresponding data, selectively receives and stores the data. This data transmission scheme is a point-to-multipoint (or broadcast) transmission scheme in which electric signals are delivered up to the entire region of the bus and up to input ends of all PEs. In this shared bus structure, because only one data burst is transmitted via the shared bus at a time, the limit of the bandwidth is clear, and power is unnecessarily consumed because of the high electric load of the shared bus, so there is a limitation in increasing the operating speed.
However, the NOC with the structure of FIG. 1 has a point-to-point transmission scheme in which transmission data is delivered only via the link selected by the switch. That is, because simultaneous data transmissions are possible via the non-overlapping links, the NOC is noticeably superior to the shared bus structure in transmission bandwidth. In addition, the transmission links have a low electric load since they are basically connected only to the near switches, thereby facilitating an increase in the operating speed of the network. The data transmission unit in the NOC structure is a packet which is similar in form to that in a general network, and its size is appropriately determined according to the applied environment.
The structure of the NOC will now be described in more detail.
FIG. 3 briefly illustrates an NI in terms of data. Referring to FIG. 3, an NI 210 includes a packet composer/decomposer 211, an output packet buffer 212, and an input packet buffer 214.
The packet composer 211 composes a header of a packet based on address and control signals received from a PE 100, configures a payload by binding address and data signals, and makes a composed packet by attaching an error code such as parity to the packet's tail. The packet header contains a variety of information defined in the protocol, and of the information, routing information is most important. That is, the packet header includes a network address necessary for correctly delivering the corresponding packet up to the destination using the routing information. The composed packet is then sequentially stored in the output packet buffer 212.
The output packet buffer 212 outputs the stored packets to the corresponding switch via the link connected to an output packet port 213. In reverse, an input packet received at the NI 210 from the switch via an input packet port 215 is sequentially stored in the input packet buffer 214. A First-In-First-Out (FIFO) buffer can be used as the output packet buffer 212 and the input packet buffer 214.
The packet decomposer 211 sequentially reads the stored packet from the input packet buffer 214, decomposes and decrypts the read packet, and transfers the data to the PE 100.
FIG. 4 illustrates a structure of a packet switch or router. Referring to FIG. 4, a packet switch or router (hereinafter a ‘packet switch’) 220 generally includes input packet buffers 223, output packet buffers 225, a crossbar fabric 221 for connecting the input packet buffers 223 to the output packet buffers 225, and a crossbar scheduler 222 for controlling the crossbar fabric 221.
The main function of the packet switch 220 is to deliver the packet(s) received from input ports 224 to a particular intended output port 226 based on routing information of the packet header.
The crossbar fabric 221, unlike the conventional shared bus, provides a non-blocking switching function capable of simultaneously delivering several packets to different output ports. If the crossbar fabric 221 receives transmission requests to the same output port simultaneously from a number of different input packet buffers 223, an output conflict happens. In this case, the crossbar fabric 221 unavoidably selects and transmits only one packet among different input packet buffers 223, and a previously non-selected packet among different input packet buffers 223 will be transmitted at the next time after waiting in the input packet buffer. When the output conflict happens, the crossbar scheduler 222 consequently selects only one of the requests. The crossbar scheduler 222 can be implemented with a variety of scheduling algorithms according to an application. Generally, one of a round-robin algorithm of giving the top priority to fairness and a fixed-priority algorithm of following a predefined priority is used. Alternatively, the output packet buffers can be omitted according to an application.
In this packet communication, if transmission of the packet is delayed due to the output port conflict (or output conflict) at the switch node or the packet hot spot at a receiving PE, a packet buffer of each switch node or NI temporarily stores the packet. However, the packet buffer cannot store the packet endlessly because of its finite capacity. Generally, in the computer network (or Internet), when the buffer's capacity is exceeded (Buffer Overflow), a loss of the packet may happen. However, in the system requiring a fast accurate operation, like in the semiconductor device, the packet loss may undesirably result in the serious system latency and failure. Therefore, to prevent the packet loss caused by the buffer overflow, the NOC uses a so-called flow control mechanism.
Generally, link-level flow control scheme, which is simple in realization compared to the ‘End-to-End’-level flow control scheme, is used as the flow control scheme. According to this scheme, if a backlog of a certain packet buffer exceeds a predefined threshold, the scheme immediately sends a signal so that a source entity for transmitting a packet to this buffer can no longer send additional packets. This source entity can be either another packet buffer or a PE.
FIG. 5 illustrates an operation scheme of the flow control based on a Back-Pressure signal, in the exemplary case where overflow happens in an input packet buffer of a switch. Referring to FIG. 5, in this scheme, a 1-bit Back-Pressure (BP) signal line (an arrow denoted by a dotted line) exists in parallel with all links, respectively, and a transfer direction of the BP signal is opposite to a packet transfer direction (an arrow denoted by a solid line) of the link. If a threshold of an input packet buffer #1 621 in a switch #2 620 is assumed to be 4, the flow control starts its operation because the current backlog of the input packet buffer #1 621 is 4. Then a HIGH value is carried on the BP signal line connected to a packet buffer whose backlog exceeds the threshold, and as a result, this signal is delivered to an output packet buffer #1 611 of a switch #1 610, which is the sole source that is transmitting a packet to the overflown input packet buffer #1 621. The output packet buffer #1 611 immediately stops the packet transmission until a value of the corresponding BP signal line is LOW. For the time being, the packet received at the output packet buffer #1 611 is continuously stored in the output packet buffer #1 611 as long as its capacity is permitted. After a lapse of a predetermined time, if the backlog of the overflow input packet buffer #1 621 in the switch #2 620 is less than 4, the corresponding BP signal immediately drops to LOW, and upon receipt of this signal, the output packet buffer #1 611 in the switch #1 610 continues to start following packet transmission. In this manner, the scheme prevents the packet loss caused by the packet buffer overflow in the network. In addition, this flow control method prevents overload of the network, thereby facilitating an increase in the entire efficiency of the network.
Meanwhile, when overflow occurs in the output packet buffers of the switch, the crossbar scheduler rejects the packet transmission request to the corresponding output port, thereby preventing the packet transmission conflict to the corresponding output packet buffer. A detailed description thereof will be omitted herein, because it has no close connection to the present invention. Even when overflow occurs in the input packet buffer of the switch connected to the NI, the flow control works according to the same method as the method described in FIG. 5.
FIG. 6 illustrates a flow control method between an NI and a PE. Referring to FIG. 6, if a BP signal to the corresponding output packet port raises to HIGH as congestion occurs in a network connected to the output packet port 213 of the NI 210, the corresponding output packet buffer 212 immediately stops the transmission of the output packet. Regardless of congestion to the network, the output packet buffer 212 continues to store input packets from a packet composer 211, so the output packet buffer 212 increases up to a threshold in its backlog. In this case, the output packet buffer 212 raises the BP signal up to HIGH to notify this fact to the packet composer 211, and the packet composer 211 immediately sends a HOLD signal to the corresponding PE. A transmission path of this signal is denoted by a dotted line. The HOLD signal is provided for stopping any longer packet transmission to the corresponding PE 100. The HOLD signal output from the NI 210 is generally connected to a WAIT input port 111 of the PE 100 when the PE 100 is a microprocessor. When a WAIT signal is asserted (or received), the PE 100 stops all transmission to the NI 210 until the WAIT signal is canceled. In the meantime, the PE 100 not only stops the Data-Out, but also stops the Data-In port 112 in its function because the entire PE stops its operation. Therefore, the packet transmission from the input packet buffer of the corresponding NI to the PE is blocked, and this leads to overflow of the input packet buffer. As a result, a flow control of the input packet buffer happens, so even a smooth network transitions to a congestion state. Due to a series of the network congestions, the network ends in a deadlock state where it can perform no action, and this directly results in a fault of the system. This phenomenon is called a deadlock phenomenon.
The conventional flow control scheme, after stopping the operation of the PE, resumes the operation of the PE if the congestion of the network is released. However, most processors suffer from the latency of at least one cycle or a maximum of several cycles in leaving a WAIT state (or WAIT state of a clock) after entering the WAIT state. In this case, due to the frequent intervention of the flow control (especially when the capacity of the packet buffer is insufficient), the PE frequently switches between the WAIT state and an ACTIVE state, causing an unnecessary demand for the latency. This brings a reduction in the PE performance and the network efficiency.
A method for solving these problems is to sufficiently increase a capacity of the packet buffer, and this is very difficult to pre-estimate the dynamic situation of the complex system in the chip design phase, and is much greater in the area and power consumed by the packet buffer itself compared to the other logic blocks, making it very difficult to provide the sufficient buffer capacity. Therefore, to provide an efficient flow control method at the limited buffer capacity is one of the important tasks of the NOC.
In the conventional flow control scheme of applying the WAIT signal to stop an operation of a PE, even though the PE is in the WAIT state, a clock is still input to the PE. Therefore, the PE, even though it performs no operation, consumes the power due to the clock. That is, when the congestion of the network remains for a long time, the PE continues to consume unnecessary power. There is a clock-gating method for gating the clock of the PE to avoid the power consumption at the clock when the flow control works, and in this method, the foregoing problems may be more serious.
The flow control scheme in the conventional NOC performs no control until overflow occurs in the packet buffers. Therefore, a phenomenon may occur in which most packet buffers of the network are fully filled only with the packets sent by one PE as shown in FIG. 7. In this case, flow control happens in buffers Q1˜Q4 due to overflow, so not only is the PE(A) in trouble, but also the PE(B) and PE(C) stop their operations. A way to solve this problem is to reduce the packet outputting frequency of the PE(A) before the packets of the PE(A) occupy all the packet buffers of the network. This is called ‘Traffic Shaping’ or ‘Rate Control’ in the general network. However, when this concept (or protocol) is applied intact to the NOC, its realization and control method may be very complex.