1. Technical Field
The invention relates to computer networks. More particularly, the invention relates to an information processing system.
2. Description of the Prior Art
Communication between computers over the Internet can be compared to the delivery of mail and packages by the United States Postal Service. Users access the Internet through a variety of options, e.g. phone modems, DSL modems, cable modems, T-1 lines, local area networks, wireless networks, and wide area networks.
In the world of the U.S. Postal Service, access to the mail system could be through a mail-slot in the door of your home, a mailbox at the street in front of your home, a post office box on a street corner, a post office counter, or a post office box. By analogy, each user of the Internet is assigned an address, and the Internet infrastructure learns how to deliver messages intended for them.
In the world of the Postal Service, the zip code, city, street, and street number are used progressively to determine how to route and deliver the mail. Users of the Internet rely on various networking protocols to transfer messages between computers.
In the world of the Postal Service, the protocols for delivering the mail include First Class Delivery, Next Day Air, Parcel Post, and Bulk. In the world of the Internet, messages are sent in packets, as opposed to the letters that are sent in the world of the Post Office. These packets contain information necessary for delivery, and this information is found in the Packet Header. This packet header includes the recipient's addresses and the sender's address, as well as the delivery method and style of message. The packet header is comparable to all of the information that is visible on the outside of a letter or package, i.e. recipient's address, return address, mail type, and specific handling instructions, such as FRAGILE. The remainder of an Internet packet contains user data. This user data is comparable to what is found inside an envelope or package. The Internet infrastructure has no more need to see the user data to route and deliver the message to the intended computer accurately than the post office has to open the mail it handles to figure out where to send it. Table A below shows a typical Internet packet.
TABLE ATypical Internet PacketPacket HeaderData Payload
As computers are tied together over the World Wide Web, the physical connections between the computers look like a giant spider web. The thick strands of this web transfer huge numbers of packets between big cities to move them along their way. This is comparable to the air or truck traffic carrying millions of letters between postal hubs. At each connection point on the World Wide Web, a sorting function must be performed to determine which direction a message should be sent. This sorting of packets is similar to the process where high-speed postal sorters scan letters to determine their addresses and figure out which direction to send them. Sorting of data packets is often referred to as packet classification
An optical router is a device that has many input/output (I/O) ports or connections. Each I/O port connects through an optical fiber to another optical router, optical switch, or optical adapter that can be located a long distance geographically from the first device. In simplistic terms, the purpose of an optical router is to receive data packets on each I/O port, to interpret the headers within the packet, and to route the packet out the appropriate I/O port towards the destination computer. If an optical router is unable to sort packets quickly enough, packets backup and are potentially lost by the router. In such case, the Internet slows down and computer users may lose their connections. As more and more people use the Internet, the situation internal to the optical routers that makeup part of the Internet infrastructure can start to look as chaotic as the Post Office at Christmas time.
The goal of an optical router is to interpret the packet header for each received packet as fast as possible so that the packets can be sent out the correct I/O port. This avoids delays, backups, and potentially lost packets. One problem is that thousands of different users can be sending messages through a router at the same time, and the packets all need to be sorted and routed differently. Table B below shows how the number of possible headers that can be received increases dramatically as the number of bits in the packet header increases.
TABLE BPossible Headers versus Header Bit LengthPacket Header BitLengthPossible Headers82561665536324.29 E9641.84 E191283.40 E382561.16 E775121.34 E15410241.80 E308
The problem of receiving a packet and identifying critical header information to decide where to route the packet is much like finding a needle in a haystack. Initially, routers used microprocessors and large lookup tables in memory to search for addresses and header information. Later, as data rates increased, system designers moved to content addressable memories (CAMs) to allow the received packet header to be compared to all previously analyzed packet headers simultaneously. The architecture of a CAM permits the user to apply the received header information to the memory and to determine to which location(s) it matches.
Because the performance of CAM's could not keep up with ultra-high speed router implementations, some manufacturers switched to custom ASICs (Application Specific Integrated Circuits) to evaluate packet headers in a rapid fashion.
Optical networking is a significant business opportunity because of the tremendous increases in data bandwidth requirements resulting from the increasing use of Internet. The capability of optical fibers to transmit and receive data exceeds the capability of electronic and electro-optical interface products to keep up with increasing data rates. Presently, OC-192 standard networks that operate at 10 Gbit/sec are beginning to be used. Presently available optical routers address the need attendant with processing and routing packets from OC-192 systems.
Existing Optical Network Packet Classification Schemes
High performance optical routers have been generally implemented using either CAMs or custom ASICS to perform packet classification. The custom ASIC approach must rely on filtering and interpreting some subset of possible packet data patterns to determine how to route packets. The approach is inflexible and may be difficult to scale with new standards and new protocols. The CAM approach is more flexible and is popular in high end routers. CAMs are designed to be cascaded so that greater numbers of data bits can be analyzed. CAMs are designed to permit various levels of “don't care” functionality that has increased their flexibility and usefulness.
CAM Based Classification Systems
A typical router is shown in FIGS. 1a and 1b and is used to describe some of the problems associated with increasing data rates to 10 Gbit/sec, 40 Gbit/sec, and beyond. In FIG. 1a, the optical interface 11 translates the light stream into electrical signals and vice-versa. In the receive mode, the data framer 12 is responsible for extracting a serial receive clock and corresponding serial receive data stream. The serial data stream must then be converted into a parallel sequence of words that correspond to a packet. The parallel sequence of words can be operated on by a network processor 13, and eventually routed into the switch fabric 14 where they are sent to the appropriate destination.
Custom ASIC Solution (Juniper Networks ASIC2)
Juniper Networks (Sunnyvale, Calif.) provides high performance routers that use a custom ASIC solution that is marketed as the Juniper Networks ASIC2. The Juniper Networks ASIC2 in conjunction with the Juniper “Junos” software allows up to 40 M packets/sec to be forwarded in the Juniper system. From Juniper's data sheets, the following:                Juniper's routers leave the packet in the shared memory and move only a packet pointer through the queues. When packets arrive they are immediately placed in distributed shared memory where they remain until being read out of memory for transmission. This shared memory is completely nonblocking, which in turn, prevents head-of-line blocking.        
FIG. 1b shows how a Juniper router is believed to be implemented, and how it relies on a very high speed shared SRAM 17 where packets are stored and operated on. This architecture avoids the movement of packets around in memory which can take up a considerable amount of time.
CAM and Custom ASIC Shortcomings for Packet Classification of OC-192 and Beyond
A variety of problems are beginning to plague CAM and customer ASIC based systems as data rates are moving to OC-192 (10 Gbits/sec) and OC-768 (40 Gbits/sec). Some of the biggest problems have to do with raw forwarding throughput, which is related to how many packets per second can be processed; latency, which is related to the absolute delay through a router; system power consumption; and board area. A key component of packet latency through a router is the time necessary to perform packet classification. As latency increases, the chances of experiencing upper level networking protocol timeouts for a packet increase.
Typical CAM structures have a width that is associated with how many bits the user desires to analyze, and a depth that is based on the number of possible patterns that the user wishes to differentiate between. CAMs are cascadable to meet both the width and depth that is required. The downside of cascading is that it costs money, increases board area, and increases power consumption. On the other hand, the ASIC2 solution from Juniper Networks does not appear to be cascadable. It appears to operate on data in the SRAM, and permits qualified searches on only certain fields and bits. This limits the ASIC2 solution approach when new search criteria are desired to be used.
Packet Classification Forwarding Rate and Latency Issues
The issues of forwarding rate and latency are intertwined and need to be addressed together. There are two significant architectural issues that affect forwarding rate and latency, i.e. the design of a packet's data flow through the system, and the underlying performance of the packet classification hardware.
In a CAM based system, such as that in FIG. 1a, parallel data from the data framer and any associated memory must be moved by the network processor or custom hardware into the CAM 15 for analysis. This is done after a packet has been received. This data must be moved quickly or additional latency is introduced. Table C below shows how the spacing between words in a received data pattern decreases as the serial data rate is increased. Each word that must be transferred to the CAM requires a read from the data framer's memory and a write to the CAM. In the case of very short data packets, which are the hardest for a router to handle, most of the packets must be transferred into the CAM. Even if reading from the data framer and writing to the CAM could be each done in a single cycle, this would require a dedicated 1/(3.2 nsec/2)=625 MHz processor and memory system to keep up at OC-192 rates with a 32 bit data framer. The problem becomes four times worse at OC-768 speeds and would require a processor and memory system running at 2.5 GHz.
TABLE CData Framer Output Word Separation vs. Data RateOC-48OC-192OC-7682.5 Gbit/sec10 Gbit/sec40 Gbit/secOutput Separation12.8 nsec3.2 nsec0.8 nsec(for a data Framer with a 32Bit Output Word)Output Separation25.6 nsec6.4 nsec1.6 nsec(for a data Framer with a 64Bit Output Word)
In addition to the delays and uncertainty associated with transferring the data from the data framer into the CAM memory, there is the delay of the CAM memory in processing the data once the final word has been presented. Typical CAM memories have delays of approximately 100 nsec from application of data to input match. This is expected to improve as CAM technologies improve, but is not likely to experience anything close to four times improvements as users move from OC-192 to OC-768. Due to this inherent access delay of CAM memories, the delay in receiving routing information becomes worse relative to data rate as speeds increase. This results in the need to increase queue's and storage depths to account for buffering data prior to knowing to where it should be routed.
Present CAM classification systems are claimed to operate at full line data rates. The problem is that they require packets to be received, staged, and then sent into the classification engine to determine an appropriate route or other required information. This delay increases the latency through the router for a packet to be sent. Eventually, this latency through a router can start to impact connections going through the router and can result in higher layer timeouts. As new CAM technologies are implemented, the focus is on increasing size and maintaining access time. Therefore, the access time is not scaling anywhere near as quickly as data rate.
In the case of the Juniper Network's ASIC2 solution (FIG. 1b), it is difficult to glean detailed technical information from their website. It appears as though the ASIC2 approach operates on a packet that is in shared SRAM. The appropriate bits of this packet appear to be transferred into the ASIC2 18 so that it can perform packet classification. This transfer has measurable delays associated with it, depending upon the hardware architecture and the memory speed. If the shared SRAM has a 10 nsec access time, and it is 64 bits wide, it takes 40 nsec to transfer 256 bits into the ASIC2 chip before a classification begins. The ASIC2 specification identifies a performance metric that provides a raw maximum 40 Million Packets/se of classification performance, which implies a classification every 25 nsec. This could be for packets requiring only a single data write into the ASIC2 part because it is a top end specification. Even in the Juniper ASIC2 solution, the parallel movement of data into the ASIC2 part must limit the performance of the overall packet classification system. The ASIC2 solution has a much lower inherent latency than present CAM solutions, but it's packet classification time varies based on packet movement and memory access prioritization. Even with it's higher performance, the ASIC2 solution does not begin packet classification until well after a data packet has been received. As data rates continue to increase this becomes an architectural limitation for the ASIC2 custom approach.
Power Consumption and Board Area Issues.
The overall power consumption of a router system increases because the network processor speed must be increased to process higher data rates. In addition, CAM memories have a static current draw that must be accounted for and scaled up. As an example, a currently available network data base search engine using CAM technology draws 6 Amps @1.5 Volts running at 1000 MHz. This is an extremely high 9 Watts on a single chip. This impacts the usability of this solution in applications where space is tight and power is limited. It is noted the increased power consumption also raises issues of heat dissipation that must be addressed.
As packet classification searches farther into a packet, such as to 512 or 1024 bits deep, CAM based solutions require multiple parts to be operated in parallel. This significantly increases power consumption and board area. In the case of the ASIC2 solution, increasing the depth of the classification requires an entirely new part to be developed. In addition, the ASIC2 solution could require greater memory bandwidths with higher speeds, which would entail more parts and larger ASICs.
It would be advantageous to provide an improved system for ultra-high speed packet classification of optical data that has been framed into a serial data stream.