The technical field of this invention is microprocessors and digital signal processor performing data exchange between memory ports of a multiple port device.
The present invention deals with the data transfer connecting various memory port nodes as applied to the transfer controller with hub and ports architecture. The transfer controller with hub and ports is the subject of U.K. Patent Application No. 00303373.5 filed Apr. 16, 1999, entitled TRANSFER CONTROLLER WITH HUB AND PORTS ARCHITECTURE. The transfer controller with hub and ports is a significant basic improvement in data transfer techniques in complex digital systems and provides many useful features, one of which is the internal memory port which allows connection of a virtually unlimited number of processor/memory nodes to a centralized transfer controller. The centralized transfer controller must be able to transfer data from node to node with performance relatively independent of how near or remote a node might be from the transfer controller itself. To clarify the problem solved by the present invention, it is helpful to review the characteristics, architecture, and functional building blocks of the transfer controller with hub and ports.
While direct memory access (DMA) techniques are a powerful tool in a digital signal processor system, they have their limitations. The fundamental limitation of a conventional direct memory access engine is that adding additional channel capacity requires additional hardware (in general, a replication of a complete channel). Some optimizations can be made in this area, such as sharing registers between multiple channels, but in general, the following rule holds: N-channels costs N times as much as a single channel.
Conventional direct memory access techniques read from a source, and subsequently pass the data on to a destination. The source transfers will initially proceed at full rate. However, if the source has higher data transfer bandwidth than the destination, this data will backlog within the direct memory access engine. This will eventually slow the rate at which source transfers are issued. Thus the source data transfer bandwidth is effectively restricted to that of the destination. If another channel has a different source port and a different destination port, there are no conflicts using the conventional read driven approach. However, if the source port of the other channel is the same, the other channel could not be processed. This makes for inefficiency. In a device that supports only one transfer at a time, this is acceptable. However, the transfer controller with hub and ports device supports multiple concurrent transfers and other provisions must be made. A normal transfer process in the known art starts by reading data from the source and then writing it to the destination. The source read drives the process in that it occurs first, and everything follows as a consequence.
With a conventional read driven approach, the source will start reading data, which will be passed to the destination. However, if the destination is slow, a backlog of data waiting to be written will eventually cause the source read process to stall because it will not have anywhere to put the data read. With only one channel this is acceptable, but if there are multiple channels, conflicts occur. The source for this channel is stalled and cannot respond to more read requests. However, it is desirable to be able to service a different channel instead.
These basic limitations to conventional data transfer techniques led to the initial development of the transfer controller with hub and ports. The transfer controller with hub and ports is a unique mechanism which consolidates the functions of a direct memory access and other data movement engines in a digital signal processor system (for example, cache controllers) into a single module.
Consolidation of such functions has both advantages and disadvantages. The most important advantage of consolidation is that it will, in general, save hardware since multiple instantiations of the same type of address generation hardware will not have to be implemented.
On a higher level, it is also advantageous to consolidate address generation since it inherently makes the design simpler to modify from a memory-map point of view. For example, if a peripheral is added or removed from the system, a consolidated module will be the only portion of the design requiring change. In a distributed address system (multi-channel direct memory access for example), all instances of the direct memory access channels would change, as would the digital signal processor memory controllers.
Fundamental disadvantages of the consolidated model, however, are its inherent bottle necking, resulting from conflicting multiple requests, and its challenge to higher clock rates. Additionally, there is in general an added complexity associated with moving to a consolidated address model, just because the single module is larger than any of the individual parts it replaces.
The transfer controller with hub and ports, to which this invention relates, is a highly parallel and highly pipelined memory transaction processor. This transfer controller with hub and ports serves as a backplane to which many peripheral and/or memory ports may be attached.
This invention allows for the connection of the multiple memory port nodes of multi-processor devices to be connected in a manner which preserves read latency irrespective of how near or remote a node may be from a centralized data transfer controller such as the transfer controller with hub and ports architecture upon which it is based. Using this manner of connection, referred to as synchronous fixed latency loop, the issue of a read command and retrieval of data at the memory port requesting the data transfer requires a fixed number of clock cycles for any memory port on the data transfer bus. This allows for more straightforward implementation of the read-then-write operations which makes up the data transfer process. Such a device is described in U.K. Patent Application No. 9916705, filed Jul. 9, 1999, entitled DATA BUS USING SYNCHRONOUS FIXED LATENCY LOOP.
The present invention further refines the synchronous fixed latency loop, allowing higher transaction throughput by allowing both a read and a write operation to be passed through the successive nodes of the synchronous fixed latency loop on each clock cycle. This refinement is subject to the stipulation that the read node address must be different from write node address, as a given node will not be configured to carry out both a read and a write simultaneously on occurrence of a given clock.
The refinement further requires that an additional data transfer word must be added to the word group, which circulates through the data transfer loop. In particular, the synchronous fixed latency loop was originally transferred a command/address word, a read data word and a write data word. The single address word allowed for only a read or a write. This invention adds a second address word. Thus the synchronous fixed latency loop of this invention transfers a read command/address word, a read data word, a write command/address word and a write data word. The additional address enables simultaneous transmission of both read and write commands. This doubles the peak data transfer bandwidth.