1. Field of the Invention
The present invention relates to the field of data communications and data processing. More particularly, the present invention relates to an indexed buffering method for accessing memory or register elements wider than the bus bandwidth.
2. Description of Related Art and General Background
The unprecedented growth of data networks (e.g., corporate-wide Intranets, the Internet, etc.) as well as the development of network applications (e.g., multimedia, interactive applications, proprietary corporate applications, etc.) have resulted in creating a demand for higher network bandwidth capabilities and better network performance. Moreover, such demands are exacerbated by the advent of policy-based networking, which requires more data packet processing, thereby increasing the amount of work per packet and occupying processing resources. One approach to increase network bandwidth and improving network performance is to provide for higher forwarding and/or routing performance within the network.
Some improvements in routing performance are directed to enhancing processor throughput. Processor designers have been able to obtain throughput improvements by greater integration, by reducing the size of the circuits, and by the use of single-chip reduced instruction set computing (RISC) processors, which are characterized by a small simplified set of frequently used instructions for rapid execution. It is commonly understood, however, that physical size reductions cannot continue indefinitely and there are limits to continually increasing processor clock speeds.
Further enhancements in processor throughput include modifications to the processor hardware to increase the average number of operations executed per clock cycle. Such modifications, may include, for example instruction pipelining, the use of cache memories, and multi-thread processing. Pipeline instruction execution allows subsequent instructions to begin executing before previously issued instructions have finished. Cache memories store frequently used and other data nearer the processor and allow instruction execution to continue, in most cases, without waiting the full access time of a main memory. Multi-thread processing divides a processing task into independently executable sequences of instructions called threads and the processor, recognizing when an instruction has caused it to be idle (i.e., first thread), switches from the instruction causing the memory latency to another instruction (i.e., second thread) independent from the former instruction. At some point, the threads that had caused the processor to be idle will be ready and the processor will return to those threads. By switching from one thread to the next, the processor can minimize the amount of time that it is idle.
In addition to enhancing processor throughput, improvements in routing performance may be achieved by partitioning the routing process into two processing classes: fast path processing and slow path processing. Partitioning the routing process into these two classes allows for network routing decisions to be based on the characteristics of each process. Routing protocols, such as, Open Shortest Path First (OSPF) and Border Gateway Protocol (BGP), have different requirements than the fast-forwarding Internet Protocol (FFIP). For example, routing protocols, such as OSPF and BGP, typically operate in the background and do not operate on individual data packets, while FFIP requires IP destination address resolution, checksum verification and modification, etc. on an individual packet basis.
The IP fast forwarding problem is becoming harder as the amount of time allotted for processing on a per packet basis steadily decreases in response to increasing media transmission speeds. In an effort to alleviate this problem, many router and Layer-3 switch mechanisms distribute the fast path processing to every port in their chassis, so that fast path processing power grows at a single port rate and not at the aggregate rate of all ports in the box. This provides only temporary relief as network wire speeds have increased exponentially recently (e.g., Ethernet""s 10, 100, to 1,000 MBps increase) while processing speeds have traditionally improved, on average, by a factor of two every 18 months. It is clear that most of current solutions will run out of steam, as the faster media become the mainstream.
Methods and apparatuses consistent with the principles of the present invention, as embodied and broadly described herein, provides an indexed buffering scheme to access memory and register elements wider than a bus bandwidth. In order to achieve this end, the present invention includes a full 32-bit multiplexed address/data bus having a multiple bit word alignment and an interface having a four word deep multiple write buffer capable of burst access of up to and greater than a 128 bit wide memory element.
The present invention, therefore, may be directed to a system, or one or more parts thereof, for producing an extended double word access for transferring data, in the form of packets, at a rate of about 10 gigabits per second. This is accomplished through a scheme of only reading and writing the amount of data needed during processing or packet transfer. An example of such would be if reading a 32 bit word from a 128 bit bus, rather than wasting 96 bits of bandwidth, the inventive method provides additional data burst in order to maximize data transfer. Likewise, if writing a 128 bit word to a 32 bit bus, the inventive method provides a four cycle burst, keeping the 128 bit intact across the 32 bit bus. The inventive method provides a solution, which includes a variable data length, that is some multiple of the bus width. Such a system may involve hardware support for larger width accesses, as well as support for special operation access.