Systems on silicon show a continuous increase in complexity due to the ever increasing need for implementing new features and improvements of existing functions. This is enabled by the increasing density with which components can be integrated on an integrated circuit. At the same time the clock speed at which circuits are operated tends to increase too. The higher clock speed in combination with the increased density of components has reduced the area which can operate synchronously within the same clock domain. This has created the need for a modular approach. According to such an approach the processing system comprises a plurality of relatively independent, complex modules. In conventional processing systems the systems modules usually communicate to each other via a bus. As the number of modules increases however, this way of communication is no longer practical for the following reasons. On the one hand the large number of modules forms a too high bus load. On the other hand the bus forms a communication bottleneck as it enables only one device to send data to the bus.
A communication network forms an effective way to overcome these disadvantages. Networks on chip (NoC) have received considerable attention recently as a solution to the interconnect problem in highly-complex chips. The reason is twofold. First, NoCs help resolve the electrical problems in new deep-submicron technologies, as they structure and manage global wires. At the same time they share wires, lowering their number and increasing their utilization. NoCs can also be energy efficient and reliable and are scalable compared to buses. Second, NoCs also decouple computation from communication, which is essential in managing the design of billion-transistor chips. NoCs achieve this decoupling because they are traditionally designed using protocol stacks, which provide well-defined interfaces separating communication service usage from service implementation.
Using networks for on-chip communication when designing systems on chip (SoC), however, raises a number of new issues that must be taken into account. This is because, in contrast to existing on-chip interconnects (e.g., buses, switches, or point-to-point wires), where the communicating modules are directly connected, in a NoC the modules communicate remotely via network nodes. As a result, interconnect arbitration changes from centralized to distributed, and issues like out-of order transactions, higher latencies, and end-to-end flow control must be handled either by the intellectual property blocks (IP) or by the network.
Most of these topics have been already the subject of research in the field of local and wide area networks (computer networks) and as an interconnect for parallel machine interconnect networks. Both are very much related to on-chip networks, and many of the results in those fields are also applicable on chip. However, NoC's premises are different from off-chip networks, and, therefore, most of the network design choices must be reevaluated. On-chip networks have different properties (e.g., tighter link synchronization) and constraints (e.g., higher memory cost) leading to different design choices, which ultimately affect the network services.
Introducing networks as on-chip interconnects radically changes the communication when compared to direct interconnects, such as buses or switches. This is because of the multi-hop nature of a network, where communication modules are not directly connected, but separated by one or more network nodes. This is in contrast with the prevalent existing interconnects (i.e., buses) where modules are directly connected.
Modern on-chip communication protocols (e.g., Device Transaction Level DTL and AXI-Protocol) operate on a split and pipelined basis with transactions consisting of a request and a response, and the bus is released for use by others after a request issued by a master is accepted by a corresponding slave. Split pipelined communication protocols are used especially in multi-hop interconnects (e.g., networks on chip, or buses with bridges), allowing an efficient utilization of the interconnect. The efficiently of a split bus can be increased for cases where a response generation at the slave is time consuming. On a pipelined protocol, a master is allowed to have multiple outstanding requests (i.e., requests for which the response is pending or expected).
The above-mentioned protocols are designed to operate at a device level, as opposed to a system or interconnect level. In other words they are designed to be independent of the actual interconnect implementation (e.g., arbitration signals are not visible) allowing the reuse of intellectual property blocks IP and their earlier integration.
In particular, the above-mentioned on-chip communication protocols comprise four main groups of signals, namely commands (or address), write data, read data and write response. The command group consists of command, addresses and command flags like burst length and mask. The command and write data groups are driven by the initiator to the target. The read data and write response are driven by the target to the initiator following a command from an initiator. All four groups are independent of each other with some ordering constraints between them, e.g. a response cannot be issued before a command.
These on-chip communication protocols also implement the concept of buffering data which is well-known in the art of chip design. Typically, buffering is used to decouple different modules, wherein one module produces data and the other consumes the data. Without buffering, the producing module would be blocked by the consuming module until it is ready to accept its data. In order to avoid the blocking of the producing module, a buffer may be introduced, storing the data produced by the producing module and thus allowing the producer to continue its execution even when the consuming module is not ready. When the consuming module is ready to accept some or all buffered data, the data stored in the buffer is immediately supplied to the consuming module.
On the other hand modern on-chip communication protocols also use the buffering of write commands or data in order to improve the interconnect utilization. Accordingly, small write bursts are stored or accumulated in a buffer before they are sent over an interconnect. Instead of being transferred in short burst, the accumulated data will be transported in a long burst over the interconnect, which usually leads to an improved interconnect utilization. This may be implemented by buffering first write data W1 (i.e. the data is not transferred over the interconnect) which is then not transferred until for example a second write data W2 arrives in the buffer, such that they are transferred as one burst with an optimal length with regards to the interconnect utilization.
Therefore, data from a number of writes can be buffered and aggregated in one burst. In addition, parts of the data in write commands may be sent in separate bursts.
The reason for the implementation of this buffering technique in the above-mentioned on-chip communication protocols is that the intellectual property blocks IP in a system on-chip connected by an interconnect should be able to communicate “naturally”, i.e. the word width and the burst sizes are configured such that they rather suit the device than the interconnect. For example, if an intellectual property block IP processes pixels, then these intellectual property blocks consume and produce pixels, while in the case that they process video frames, they consume and produce video frames. By buffering the data, the data to be transmitted over the interconnect is forced to wait until a sufficient amount of data is gathered such that these data can be transferred at once in a burst.
The above-mentioned on-chip protocols have been designed mainly for buses with a small latency. In addition, these protocols have been designed based on the assumption that read operations are always urgent and should therefore be completed as soon as possible without unnecessary buffering. However, as systems grow larger and multi-hop interconnects like networks or buses with bridges, the latency grows as well. In these cases the communication granularity become coarser and the latency requirements become less strict.
In addition, these protocols comprise means to force some of the currently buffered data to be transferred although the optimal burst length has not been reached in order to prevent deadlock caused by buffering data indefinitely. The DTL communication protocol provides a flush signal forcing all data up to the current word to be transferred over the interconnect. The AXI protocol provide an unbuffered flag for write commands to force buffered data to be transferred.
It is therefore an object of the invention to provide an integrated circuit, a method for buffering as well as a data processing system with an improved interconnect utilization.
Therefore, an integrated circuit comprising a plurality of processing modules coupled by an interconnect means is provided. A first processing module communicates with a second processing module based on transactions. A first wrapper means associated to said second processing module buffers data from said second processing module to be transferred over said interconnect means until a first amount of data is buffered and then transfers said first amount of buffered data to said first processing module.
Accordingly, data is buffered on the slave side until a sufficient large amount of data to be transferred over the interconnect in a single package is reached. Reducing the number of packets sent over the interconnect reduces the overhead of the communication as less packet headers are required. The data to be sent is buffered until a sufficient amount of data is gathered.
According to an aspect of the invention, a second wrapper means is associated to the first processing module for buffering data from said first processing module to be transferred over the interconnect means to said second processing module until a second amount of data is buffered and said second wrapper means then transfers said buffered data to said second processing module. Therefore, data is buffered on the master as well as on the slave side until a sufficient large amount of data to be transferred over the interconnect in a single package is reached.
According to a further aspect of the invention said first and second wrapper means are adapted to transfer the buffered data in response to a first and second unbuffer signal, or a particular combination of a group of signals, respectively (even if less than the first and second amount of data is buffered in said first and second wrapper means). By issuing the unbuffer signals an occurrence of a deadlock due to a processing waiting for the buffered data can be avoided.
According to a further aspect of the invention said first and second wrapper means are adapted to transfer the buffered data according to a first and second unbuffer flag, respectively (even if less than the first and second amount of data is buffered in said first and second wrapper means). Therefore, an alternative approach to flush buffered data is provided. As opposed to the signal, which is given for each transaction, the flag may be set for a longer time. In this way, the buffering can be switched on or off. The flag can be set/unset in any way, e.g., with a signal from the IP as part of a transaction, or via separate configuration transactions (either special flush transactions or a memory-mapped reads and writes). These transactions can be issued either from the same IP, or from a separate configuration module.
According to a preferred aspect of the invention at least one of said first and second wrapper means comprise a determination unit BLDU for determining the optimal first or second amount of data to be buffered in said first or second wrapper means before said data is transferred according to the communication properties of said communication between said first and second processing module. Accordingly, the packet size of the data transferred over the interconnect can be adapted according to the properties of the actual communication and thereby the utilization of the interconnect can be improved.
The invention also relates to a method for buffering data in an integrated circuit having a plurality of processing modules being connected with an interconnect means, wherein a first processing module communicated to a second processing module based on transactions, comprising the step of buffering data from said second processing module to be transferred over the interconnect means until a first amount of data is buffered, wherein the buffered data are transferred when said first amount of data has been buffered.
The invention further relates to a data processing system comprising an integrated circuit comprising a plurality of processing modules coupled by an interconnect means is provided. A first processing module communicates with a second processing module based on transactions. A second wrapper means associated to said second processing module buffers data from said second processing module to be transferred over said interconnect means until a first amount of data is buffered and then transfers said first amount of buffered data to said first processing module.
Accordingly, the buffering of data as described above can also be applied in a system comprising a plurality of integrated circuits.
The invention is based on the idea to buffer data until the buffered data is sufficiently large to be transferred optimally over an interconnect means in a packet. The larger a packet is, the smaller is the amount of packet headers and therefore the overhead is reduced and the interconnect is utilized more efficiently. The data is only transferred when sufficient data for an optimal packet size has been buffered even when data can be sent earlier. The data is only transferred from the buffer when the conditions for an optimal transfer are satisfied.
Further aspects of the invention are described in the dependent claims.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiment (s) described hereinafter.