The present invention relates to network processor designs and network processing architecture, and, in particular it concerns the design and utilization of programmable, task-customized processors and super-scalar architecture to provide improved processing performance.
The demand for intelligent, increasingly complex network processing at wire speed has led to the creation of network processors (also called communications processors). Programmable network processors provide system flexibility while delivering the high-performance hardware functions required to process packets at wire speed. Consequently network processors are expected to become the silicon core of the next generation of networking equipment.
The architecture of most known network processors is based on integration of multiple RISCs (Reduced Instruction Set Computer) processors into a silicon chip.
There are several known network processor designs based on the integration of multiple xe2x80x9coff the shelfxe2x80x9d RISC processors into a single chip. By definition, a RISC computer architecture reduces chip complexity by using simpler instructions than those of CISC (Complex Instruction Set Computer) computers.
In a RISC the microcode layer and associated overhead is eliminated. A RISC maintains a constant instruction size, dispenses with the indirect addressing mode and retains only those instructions that can be overlapped and made to execute in one machine cycle. Because there is no microcode conversion layer, a RISC machine executes instructions quickly. However, a RISC compiler has to generate routines using simple instructions. Performing complex tasks requires many commands, each of which normally takes a clock cycle.
There are several major drawbacks to RISC-based network processors, including the use of numerous commands, the time required to perform complex tasks, and an inability to modify the data path. Although RISC-based network processors, even with these deficiencies, are capable of attaining improved performance, the deficiencies do not allow most RISC-based network processors to deliver processing performance on more than a handful of Gigabit ports.
Although RISC processors are frequently deployed in parallel to produce high speeds, the architecture is still constrained by the RISC throughput. Perhaps more importantly, for a given chip size, there is a physical limit to the number of RISCs that can be incorporated without exceeding a practical silicon die size.
Because the wire speed on the Internet has increased by orders of magnitude over the last few years, there are increasing demands on network processing Systems to improve performance to match the wire speed, thereby averting bottle-necking and associated problems related thereto. However, the performance (speed) goal of these network processing systems approaches only 1-2 gigabits per second, well behind the wire speed of 10-40 gigabits per second that is already on the horizon.
One method of increasing processing speed is to enlarge the size of the chip. Much progress has been achieved in this direction over the past 30 years. It appears, however, that a further increase in chip size will be rather expensive, as the probability of obtaining a defect increases appreciably for chips having a characteristic length dimension exceeding 13 millimeters. Moreover a modest increase in the characteristic length dimension, e.g., from 13 millimeters to 16 millimeters, results in an area increase of only 50%, a far cry from the order-of-magnitude increase in performance that is required.
Another method of increasing processing speed is to increase the transistor/process density of the chip, i.e., the number of gates per unit area. Much progress has been achieved in this direction in the past, and it appears likely that progress will continue to be made in the foreseeable future. Historically, however, the number of gates per unit area has increased at a rate of about 50% per year, such that the requisite order-of-magnitude increase in performance appears to be many years away.
There is therefore a recognized need for, and it would be highly advantageous to have, a network processing system that provides significantly faster performance than existing network processing systems, and more particularly, a network processing system that provides significantly faster performance than existing network processing systems for a given chip size and transistor/process density.
The present invention is a system that utilizes task-customized processors to provide improved processing performance. These task-customized processors can be integrated in a super-scalar architecture to further enhance processing speed.
According to the teachings of the present invention there is provided a high-speed system for processing a packet and routing the packet to the requisite packet destination port comprised of: (a) a memory block for storing tabulated entries, and (b) at least one microcode machine, interacting with the memory block and accessing the tabulated entries, for processing and routing the packet, wherein at least one of the at least one microcode machine is a customized microcode machine.
According to yet another aspect of the present invention there is provided a high-speed system for processing a packet and routing the packet to a packet destination port, the system comprising: (a) a memory block for storing tabulated entries, (b) a parsing subsystem containing at least one microcode machine for parsing the packet, thereby obtaining at least one search key, (c) a searching subsystem containing at least one microcode machine for searching for a match between the at least one search key and the tabulated entries, (d) a resolution subsystem containing at least one microcode machine for resolving the packet destination port, and (e) a modification subsystem containing at least one microcode machine for making requisite modifications to the packet; wherein at least one of the microcode machines is a customized microcode machine.
According to yet another aspect of the present invention there is provided a system for processing a packet and routing the packet to a packet destination, the system comprising: (a) a memory block for storing tabulated entries, (b) a parsing subsystem containing at least one microcode machine configured for parsing the packet, thereby obtaining at least one search key, (c) a searching subsystem containing at least one microcode machine configured for searching for a match between the at least one search key and the tabulated entries, (d) a resolution subsystem containing at least one microcode machine configured for resolving the packet destination, and (e) a modification subsystem containing at least one microcode machine configured for making requisite modifications to the packet.
According to further features in the described preferred embodiments, the system has super-scalar architecture.
According to still further features in the described preferred embodiments, at least one of the customized microcode machines has a customized instruction set.
According to still further features in the described preferred embodiments, at least one of the customized microcode machines has a customized data path.
According to still further features in the described preferred embodiments, some or all of the memory block is embedded in a single chip along with the microcode machines.
According to still further features in the described preferred embodiments, the packets are processed at a rate of at least 8 gigabits per second.
According to still further features in the described preferred embodiments, the packets are processed at a rate of at least 16 gigabits per second.
According to still further features in the described preferred embodiments, the customized microcode machine is a parsing microcode machine.
According to still further features in the described preferred embodiments, the customized microcode machine is a searching microcode machine.
According to still further features in the described preferred embodiments, the customized microcode machine is a resolution microcode machine.
According to still further features in the described preferred embodiments, the customized microcode machine is a modification microcode machine.
According to still further features in the described preferred embodiments, the parsing subsystem, the searching subsystem, the resolution subsystem, and the modification subsystem are embedded in a single chip.