Technical Field
The present disclosure relates to a communication interface for interfacing a transmission circuit with an interconnection network. Embodiments have been developed with particular attention paid to possible use in communication interfaces that are typically used for transmission of the DMA (Direct Memory Access) type.
Description of the Related Art
Systems within an integrated circuit (Systems-on-Chip—SoCs) and systems in a single package (Systems-in-Package—SiPs) typically comprise a plurality of circuits that communicate with one another via a shared communication channel. For instance, the aforesaid communication channel may be a bus or a communication network, such as for example a Network-On-Chip (NoC) or Network-in-Package (NiP), and is frequently referred to as “interconnection network” (ICN).
For instance, the above SoCs are frequently used for processors designed for mobile or multimedia applications, such as for example smartphones, set-top boxes, or routers for domestic uses.
FIG. 1 shows an example of a typical SoC 1.
In the example considered, the system comprises a processor 10 and one or more memories 20. For instance, illustrated in the example considered are a small internal memory 20a, such as for example a RAM (Random-Access Memory), a non-volatile memory 20b, such as for example a flash memory, and a communication interface 20c for an external memory, such as for example a DDR memory.
In the example considered, the system also comprises interface circuits 30, such as for example input and output (I/O) ports, a UART (Universal Asynchronous Receiver-Transmitter) interface, an SPI (Serial Peripheral Interface) interface, a USB (Universal Serial Bus) interface, and/or other digital and/or analog communication interfaces.
In the example considered, the system also comprises further peripherals 40, such as for example comparators, timers, analog-to-digital or digital-to-analog converters, etc.
In the example considered, the aforesaid modules, i.e., blocks 10, 20, 30 and 40, are connected together through a communication channel 70, i.e., an interconnection network, such as for example a bus or preferably a Network-On-Chip (NoC).
The general architecture described previously is frequently used for conventional micro-controllers, which renders any detailed description here superfluous. Basically, this architecture enables interfacing of the processor 10 with the various blocks 20, 30 and 40 via software commands that are executed by means of the processor 10.
In multimedia or mobile processors other blocks 50 are added to the above generic architecture, which will be referred to hereinafter as Intellectual Property (IP) circuits. For instance, the aforesaid IP blocks 50 may comprise an image or video encoder or decoder 50a, an encoder or decoder of audio signals 50b, a WiFi communication interface 50c, or in general blocks, the hardware structure of which is optimized for implementation of functions that depend upon the particular application of the system. The aforesaid blocks may even be autonomous and interface directly with the other blocks of the system, for example the memories 20 and the other peripherals 30 and 40.
Typically, associated to each IP block 50 is a respective communication interface 80 configured for exchanging data between the IP block 50 and the communication channel 70.
For instance, FIG. 2 shows a block diagram of a typical communication interface 80 for an IP block 50.
In the example considered, the communication interface 80 comprises:                a transmission memory 802a for temporary saving output data, i.e., the data coming from the respective IP block 50;        a reception memory 802b for temporary saving input data, i.e., the data coming from the communication channel 70;        an interface 804 for exchanging data between the memories 802a, 802b and the communication channel 70, for example for sending the data saved in the transmission memory 802a to the communication channel 70 and saving the data received from the communication channel 70 in the reception memory 802b; and        a control circuit 806, which, for example, controls the flow of data between the IP block 50 and the communication channel 70, monitors the state of the memories 802a and 802b, and generates the control signals for the IP block 50.        
Typically, the reception memory 802b is a FIFO (First-In/First-Out) memory. However, in the case where the data received may be out of order, the reception memory 802b or the interface 804 may also re-order the data before they are written in the reception memory 802b. 
In the example considered, no interface is illustrated for exchange data between the IP block 50 and the memories 802a and 802b, because typically the IP block 50 is able to exchange the data directly with the memories 802a and 802b, for example by exploiting the control signals generated by the control circuit 806.
For instance, FIGS. 3a and 3b show a scenario of a typical data flow. In particular, FIG. 3a is a block diagram that shows the data flow of a typical transmission of data, and FIG. 3b is a flowchart that shows the respective transmission steps.
After an initial step 1000, the processor 10 sends, in a step 1002, an instruction to the block 50a indicating that the memory 20a contains data for the block 50a. For instance, for this purpose, the processor 10 may send to the block 50a an instruction indicating a start address and an end address within the memory 20a (or else a start address and the length of the transfer). Alternatively, the processor 10 could configure the aforesaid area by writing the start address and the end address directly in a configuration register of the block 50.
Next, in a step 1004, the block 50a reads the data from the memory 20a by means of the respective communication interface 80a. In particular, typically, the communication interface 80a sends for this purpose to the memory 20a a read request, and the memory sends to the communication interface 80a the data requested. For instance, typically both the read request and the response are sent through the interconnection network 70 via data packets.
Finally, once all the data have been read, the block 50a or the communication interface 80a generates, in a step 1006, an interrupt that signals to the processor 10 the fact that the transmission has been completed.
Next, the processor 10 can allocate, in a step 1008, the respective area of the memory 20a to another process, and the procedure terminates in a step 1010.
Consequently, typically the blocks 50 access the memory 20 by means of a Direct Memory Access (DMA), i.e., the blocks 50 access the memory directly without any intervention on the part of the processor 10.
Typically, the aforesaid DMAs may be of two types: a data-write request or a data-read request. The read and write DMA transfers are substantially identical except for the data:                in the case of a write request, the data are sent by the IP block 50 that has requested the DMA; and        in the case of a read request, the data are sent by the destination block that receives the read request.        
Both of the requests are typically characterized either by a start address and an end address from which data is to be read/written or by a start address and a length of the transfer.
For instance, the above address can comprise the address of a node of a NoC, the memory address within the destination (for example, in the case of a memory), or a combination of both. Consequently, both the write requests and the read requests are typically accompanied by a start address that identifies the addressee of the request, and the aforesaid address may belong to the memory map of the system. In this case, the interconnection system 70 decodes the address received and identifies the addressee that is to receive or supply the data and conveys appropriately the replies that it receives from the addressee to the source of the communication.
Furthermore, the various blocks of the system 1 may also simultaneously access the interconnection network 70.
For instance, the blocks 10 and 50 are typically the communication sources (initiators), which request DMA transfers (both writing and reading transfers) in competition with one another, where each could even present a plurality of channels. Instead, the blocks 20, 30 and 40 are typically addressees, which receive or send data in accordance with the requests.
For this reason, there may exist simultaneously a number of DMA communication channels, which, once converted into the protocol of the interconnection network 70, are to be transmitted through the network 70 itself.
FIG. 4 shows an example of a typical solution that can be used for transmission of a plurality of DMA communications coming from respective circuits designated as a whole by 90. For instance, the circuits 90 may be the processor 10 and/or an IP block that sends a data-read request or a data-write request.
Typically, each transmission circuit 90 has associated to it an interface circuit 92 that converts the DMA transmission coming from the respective circuit 90 into a communication that uses the protocol of the interconnection network 70; i.e., the interface 92 makes a conversion between the transport layer and the link layer. For instance, the blocks 10-40 are typically optimized for a given architecture, and the interface 92 is directly integrated in the respective block. Instead, the IP blocks 50 are typically not optimized for a specific communication protocol, and consequently an additional interface is frequently required (see, for example, the blocks 80 in FIG. 1 or FIG. 3a). For instance, as mentioned previously, the interface 92 could segment the DMA communication and add respective headers for forming data packets that can be forwarded to the destination through the interconnection network 70.
Frequently, different circuits 92 have to transmit data simultaneously. For this reason, the interconnection network 70 typically has associated a circuit 94 that regulates access to the interconnection network 70, which is typically referred to as arbiter, planner, or scheduler. For this reason, the interface circuit 92 typically comprises a memory (see FIG. 2) for temporarily saving the data coming from the respective circuit to render the operation of the respective circuit 90 independent of possible delays in the transmission of the data over the interconnection network 70.
Typically, the arbiter 94 is directly integrated in the interconnection network 70 and could be, for example, a router node of a NoC. In fact, in general, in the solution illustrated in FIG. 4, also the arbiter 94 uses the protocol of the interconnection network 70 and can, for example, analyze the header of the various packets for determining the priority of the transmissions in such a way as to guarantee a certain quality of service (QoS).
Consequently, in general, different transmission circuits 90 may send to one and the same memory 20 read requests and/or write requests that are interleaved.
However, the aforesaid type of access may cause problems when the memory 20 comprises a plurality of memory pages, for example when the memory 20 is a DDR memory.
In this context, FIG. 5 shows a typical memory 20.
In the example considered, the aforesaid memory 20 comprises a physical memory 202 that may also comprise a plurality of memory blocks and a memory-control circuit 204 that handles the accesses to the physical memory 202.
For instance, in the case where the memory 202 is a DDR memory, the memory is structured in different memory rows, or memory pages. In fact, as envisaged by the operation of DDR memories, to access a new memory row the so-called operation of “row precharge” must be carried out. The aforesaid operation typically requires various clock cycles, the so-called “Row Precharge Time”, which depends upon the particular DDR memory used.
Consequently, the physical memory 202 may comprise physical memory pages P1 . . . Pn, where the access to a new memory page requires additional access time.
In general, organization of the memory in memory pages may even be just virtual. For instance, to speed up accesses to the physical memory 202, the memory-control circuit 204 may have associated to it a cache memory 206 that has a smaller capacity than the memory 202. In this case, the entire memory area of the memory 202 is virtually divided into blocks P1 . . . Pn, which have the same size as the cache memory 206. In this case, when access to a given memory location within a page is requested, not just the datum requested is loaded, but the entire page to which the datum belongs. Consequently, when reading of a datum is requested, this datum is first sought in the cache memory 206. In the case where the datum is present, the so-called “page hit”, the copy present in the cache memory 206 is used. Instead, when the datum is not present, the so-called “page miss”, the entire memory page associated to the aforesaid datum is retrieved and stored in the cache memory 206. In some cases, the cache memory also supports write requests. For instance, in this case, before a new memory page is loaded, the previous page present in the cache memory 206 is written again in the memory 202.
Consequently, regardless of whether the organization of the memory 20 in memory pages P1 . . . Pn is due to the physical organization of the memory (for example, the rows of a DDR memory) and/or to the virtual organization of the memory (for example, the use of a cache memory), read requests and/or write requests that access memory addresses that do not correspond to the current memory page are much slower.
However, as mentioned previously, the read requests and write requests may also come from different circuits and consequently be addressed to completely different memory addresses, which may cause continuous changes of memory page.
For instance, for this reason, memory controllers 204 are known that are able to receive a plurality of read and/or write requests and that first re-order the requests in such a way as to minimize the changes of the memory pages.
Furthermore, also the transmission circuits 90 may optimize accesses to the memory 20. For instance, a circuit, such as for example a circuit 50 that sends data of a write request or a memory 20 that sends the data in response to a read request, can group together the data that correspond to consecutive addresses in the memory 20 in a single transaction, the so-called “chunks” or “bursts”, and the interconnection network 70 can consider the aforesaid “chunks” as a single message that must not be interrupted via arbitration. For instance, typically the aforesaid type of communication is referred to as “Store and Forward” (S&F). For instance, typically communication interfaces 92 store for this purpose the entire chunk and send it to the interconnection network 70 only when all the data have been received. Instead, the various nodes of the network 70 do not necessarily have to implement the S&F mechanism, but must perform a chunk-based arbitration, i.e., an arbitration that guarantees the atomicity of the chunk.
The inventors have noted that the aforesaid type of communication adds further latencies, because the interfaces 90 can send the messages only when all the data have been received. Furthermore, additional memory space is required for storing the data of a chunk.