1. Field of the Invention
The present invention relates to computer systems and, more specifically, to mirroring data across multiple memory resources, such as memory banks.
2. Background Information
High-performance computer systems often utilize multiple processors or central processing units (CPUs). Each processor may have access to shared as well as private data, such as program instructions, e.g., algorithms, as well as data, that are stored in a memory coupled to the processors. One of the more common multiprocessor architectures is known as a systolic array in which each processor is coupled to its nearest neighbors in a mesh-like topology, and the processors perform a sequence of operations on the data that flows between them. Typically, the processors of a systolic array operate in “lock-step” with each processor alternating between a compute phase and a communicate phase.
Systolic arrays are often used when the problem being solved can be partitioned into discrete units of works. In the case of a one-dimensional (1-D) systolic array comprising a single “row” of processors, each processor is responsible for executing a distinct set of instructions on input data so as to generate output data which is then passed (possibly with additional input data) to a next processor of the array. To maximize throughput, the problem is divided such that each processor requires approximately the same amount time to complete its portion of the work. In this way, new input data can be “pipelined” into the array at a rate equivalent to the processing time of each processor, with as many units of input data being processed in parallel as there are processors in the array. Performance can be improved by adding more processors to the array as long as the problem can continue to be divided into smaller units of work. Once this dividing limit has been reached, processing capacity may be further increased by configuring multiple rows in parallel, with new input data allocated to the first processor of a next row of the array in sequence.
One place where multiprocessor architectures, such as systolic arrays, can be advantageously employed is in the area of data communications. In particular, systolic arrays have been used in the forwarding engines of intermediate network stations or nodes, such as routers. An intermediate node interconnects communication links and sub-networks of a computer network through a series of ports to enable the exchange of data between two or more end nodes of the computer network. The end nodes typically communicate by exchanging discrete packets or frames according to predefined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP) or the Internetwork Packet eXchange (IPX) protocol. The forwarding engine is often used by the intermediate node to process packets received on the various ports. This processing may include determining the destination of a packet, such as an output port, and placing the packet on an output queue associated with the destination.
The multiple processors of a forwarding engine typically have shared access to one or more memory resources, at which information needed by all of the processors, such as forwarding tables, is stored. Each memory resource, moreover, may consist of a plurality of memory banks. To ensure a consistent “view” of the data by the multiple processors, locks are often placed on the different memory resources and/or banks while they are being accessed. For example, a processor seeking to read information from a given bank locks the bank so that other processors cannot modify its contents, while the read is executing. Similarly, a processor seeking to write information also locks the respective bank so that other processors cannot read from the bank until the write operation is complete. Although such locking mechanisms ensure data consistency, they often result in delays when multiple processors try to access information from the same bank.
To reduce or eliminate such delays, it is known to copy contents of one memory bank into another memory bank. For example, if the memory resource has four banks, then the contents of bank 0 (B0) may be mirrored to bank 2 (B2), and the contents of bank 1 (B1) may be mirrored to bank 3 (B3). That is, the contents of banks B2–B3 are a mirror of the contents of banks B0–B1. To take advantage of this arrangement, each processor is assigned to one of two groups, and each group is allowed to access only one of the mirrored set of banks. In other words, the first group of processors utilizes banks B0–B1, while the second group of processors utilizes banks B2–B3. If a processor from the first group needs to read information stored at bank B1, it only locks bank B1, leaving bank B3, which contains an identical copy of the information at bank B1, unlocked. Accordingly, a processor from the second group may still access bank B3 even though bank B1 is locked. In this way, a processor from each group can read the same information simultaneously.
To implement such a mirroring arrangement, the processors typically execute reader code and writer code. The reader code for the processors is predefined to target the memory banks to which the processor is associated, e.g., banks B0–B1 or banks B2–B3. The reader code executes a loop, e.g., a spinlock, until the processor obtains a lock on the particular memory bank, e.g., bank B1, that is being read. Once the lock is obtained, the processor issues the read operation and the results returned to the requesting processor. The processor then releases the lock, thereby allowing another processor to access memory bank B1. Because all information is mirrored at two banks, a write must be performed at both locations to maintain data consistency. In this case, the writer code executes a spinlock until a lock is obtained on both banks, e.g., bank B1 and bank B3. Once both banks are locked, the processor issues the two write operations. Following the completion of the two writes, the locks are released, thereby allowing the two banks to be accessed by other processors.
In addition, a shared memory resource often imposes some type of arbitration on the processors trying to access the resource. Through arbitration, the memory resource tries to prevent one processor from repeatedly gaining access to the memory resource, and thus repeatedly blocking the other processors from gaining access. A typical arbitration scheme will force a processor that just accessed the memory resource to wait until all of the other processors have been given a chance to access the memory resource before allowing the original processor to access the memory resource a second time. As mentioned above, a processor writing to a mirrored memory bank issues two write operations, one to each of the memory banks. Assuming the memory resource implements an arbitration scheme, the processor, after issuing the first write, will then have to wait until all of the other processors have been given a chance to access the memory resource, before the processor will be permitted to issue the second write. Nonetheless, because the processor locked both banks before issuing the writes, both banks will remain locked this entire time. As a result, other processors will be blocked from accessing either bank until both writes are allowed to complete. This can reduce the efficiency of the system.
Accordingly, a need exists for a more efficient memory resource mechanism.