A network on a chip (NOC) is a novel integrated circuit architecture that applies a network-based architecture to a single chip to create a unique processing unit. A typical NOC includes a plurality of integrated processor (IP) blocks coupled to one another via the network. NOC processing units typically distribute (i.e., allocate) various parts of a job to different hardware threads of one or more IP blocks to be executed by the one or more IP blocks in the NOC processing unit, where the distribution typically includes transmitting data packets including one or more data words between one or more hardware threads of the NOC. With the number of IP blocks in the standard computer systems expected to rise, efficiently handling workload distribution has become increasingly demanding.
In many conventional NOC architecture systems, an inbox/outbox model is used, whereby transmitting data packets is often referred to as “message passing,” and conventionally a message (i.e., a data packet) is transmitted from an output buffer (i.e., an “outbox”) of a first hardware thread to an input buffer (i.e., an “inbox”) of a second hardware thread over the network of the NOC. Such conventional implementations are typically referred to as “direct inter-thread communication” messaging (hereinafter “DITC”). As such, each hardware thread of a DITC implementation includes an inbox and an outbox, and messages passed over the network of the NOC include an address corresponding to the respective destination hardware thread the message is to be passed.
Inboxes and outboxes used in DITC implementations are typically of fixed size, and thus can only buffer a limited number of messages at a time. As a result, if a destination hardware thread is unable to process incoming messages arriving at its inbox at the same rate as the messages are being sent by other, source hardware threads, those source hardware threads may have to wait for the destination hardware thread to catch up, resulting in those source hardware threads operating below maximum efficiency. As such, in conventional systems, a workload distributed between a plurality of hardware threads may become uneven as source hardware threads distributing messages generally address the messages to specific destination hardware threads, and in some cases the source hardware threads must wait on a destination hardware thread to clear enough space in the associated inbox for the messages.
One particular application of a NOC architecture is in connection with software pipelining, where hardware threads disposed in one or more IP blocks are arranged into different stages of a pipeline, and where data is streamed between the stages of the pipeline to perform a sequence of steps on the streamed data. The most efficient operation of a software pipeline is obtained whenever all hardware threads in the pipeline are operating at peak efficiency, so if any stage of a software pipeline is unable to process the data streamed to the stage at the same rate as an earlier stage outputs that data, the earlier stage backs up and operates below peak efficiency. Moreover, workloads may change dynamically, so it is often difficult to predict what the relative workloads of different stages of a pipeline will be from moment to moment, and thus, it can be difficult to maintain all of the stages of a software pipeline operating in an efficient manner.
A continuing need exists in the art for a manner of increasing the efficiency of workload distribution and message passing in computing systems including a plurality of interconnected integrated processor blocks.