1. Field of the Invention
The present invention relates to network processor devices and storage area networks, and in particular, a system and method to span multiple network protocols by providing an architecture for protocol conversion implemented within a single IC chip or as a sub-processor core component in a conventional SoC, DSP, FPGA, or similar integrated circuit sub-system.
2. Description of the Prior Art
As market shifts toward storage area networks (SAN) and network attached storage (NAS) systems, as well as with the massive expansion of the Internet, new demands on server and storage designs are placed. Storage attached via parallel SCSI connections is being replaced by Fibre Channel (FC) Storage Area Networks (SANs), and other emerging networking architectures, such as iSCSI and Fibre Channel over IP (FC-IP). iSCSI involves transfers of block data over TCP/IP networks, typically built around Gigabit Ethernet, while FC-IP is an Internet Protocol (IP) based storage networking technology which enables the transmission of FC information by tunneling data between SAN facilities over IP networks.
General purpose CPUs either cannot meet the computational requirements of the network protocol conversion, or are too expensive in terms of unit cost, space and power. This has led to the offloading of many of the networking and protocol processing functions from host processors into host-bus-adapters (HBAs) or Network Interface Controllers (NICs). Initially, most HBAs and NICs were implemented in ASICs using hardwired logic. But as the need to implement complex network protocols arose, such as TCP/IP or iSCSI, programmable solutions have become attractive because of a number of advantages they offer: they can accommodate different and evolving protocols; they are easily upgradeable via program changes; they offer a faster time to market.
The existing SANs are often physically remote, sometimes at greater distances, and are often using multiple network architectures. To consolidate existing SANs and to utilize existing WAN and LAN infrastructure there is a need for network protocol conversion, both in the data communications and telecommunications fields. The protocol conversion would allow seamless integration and operation of all different parts in the system.
A system level protocol convertor product was announced by Brocade Communications Systems for multiprotocol fabric routing services [http://biz.yahoo.com/prnews/031028/sftu100—1.html], which plan to provide Fibre Channel-to-Fibre Channel (FC-to-FC), iSCSI-to-FC bridging and Fibre Channel to FC-IP translation.
Existing protocol converters integrate multiple chips on a card to obtain desired logic functionality, or more commonly a host bus adapter card (HBA) plugged into a existing host system, or as a daughter card on a main host I/O card, resulting in bulky and a more costly product in terms of unit cost, space and power. In addition, existing protocol converters are not programmable or with very limited programmability, and not easily upgraded to accommodate different or new protocols. In addition, a variety of physical layer access modules or chips are present, their implementations and circuit technology often being optimized for one particular physical layer protocol, requiring the replacement of an entire Host Bus Adapter (HBA) card or several components when a newer physical layer protocol is required on a port. Conversion within the same physical I/O card is not typically done, and not within a single chip solution or as an embedded core within an SoC semiconductor device.
A System-on-Chip design 20 according to the prior art is illustrated in FIG. 1. It comprises a processing element such as a PPC440 (Power PC) 25, a local processor bus (PLB) 21, on-chip peripheral bus (OPB) 24, and a number of components, such as SRAM 15, DDR controller 18, PCI-X bridge 22, DMA 26 and DMA controller 28, an Ethernet Medium Access Control (MAC) protocol device 50 employed to provide the data link layer for an Ethernet LAN system, processor core timers 33 and interrupt controller 35, and an OPB bridge 29 interfacing with the OPB bus 24 and PLB 21. In the prior art implementation depicted in FIG. 1, I.B.M.'s embedded PowerPC 440 processor core and the CoreConnect local bus are utilized, but similar configurations may be found that use other embedded processor cores, such as ARM see, for instance, http://www.arm.com/products/?OpenDocument, MIPS (See MIPS: “MIPS32 4KP —Embedded MIPS Processor Core” at http://www.ce.chalmers.se/˜thomasl/inlE/mips32 —4Kp_brief.pdf) processing cores, etc. As shown in FIG. 1, other devices provided for interfacing with the On-chip Peripheral bus 24 include one or more of the following: a RAM/ROM Peripheral controller 45a, an external bus master 45b, a UART device 45c, an Inter-IC bus (12C) interface 45d, general purpose I/O interface 45e and a gateway interface 45f. 
Relevant references describing aspects of SoC processor and component design include:
U.S. Pat. No. 6,331,977 describes a System on a chip (SOC) that contains a crossbar switch between several functional I/Os internal to the chip and number of external connection pins, where the number of pins is less than the number of internal I/Os.
U.S. Pat. No. 6,262,594 describes an apparatus and method implementing a crossbar switch for configurable use of group of pads of a system on chip.
U.S. Pat. No. 6,038,630 describes an apparatus and method implementing a crossbar switch for providing shared access control device for integrated system with multiple functional units accessing external structures over multiple data buses.
U.S. patent application Ser. No. US2002/0184419 describes an ASIC which enables use of different components for a system on a chip using a common bus system and describes wrappers for functional units with different speed and data width to achieve compatibility with a common bus.
U.S. patent application Ser. No. US2002/0176402 describes an octagonal interconnection network for linking functional units on a SoC. The functional units on the interconnection network are organized as a ring and use several crossing data links coupling halfway components.
U.S. patent application Ser. No. US2001/0042147 describes a system resource router for SOC interconnection, comprising two channel sockets with connect each data cache (D-cache) and instruction (I-cache). Also included are external data transfer initiators, two internal M-channel buses, and an M-channel controller to provide the interconnection.
U.S. patent application Ser. No. US2002/0172197 describes a communication system connecting multiple transmitting and receiving devices via a crossbar switch embedded on a chip in a point-to-point fashion.
U.S. patent application Ser. No. US2001/0047465 describes several variations of an invention providing a scalable architecture for a communication system (typically a SOC or ASIC) for minimizing total gates by dividing transmissions into individual transmission tasks, determining a computational complexity for each transmission task and computational complexity being based on the number of MIPS per circuit.
In the reference entitled “On-Chip Interconnects for Next Generation System-on-Chips” by A. Brinkmann, J. C. Niemann, I. Hehemann, D. Langen, M. Porrmann, and U. Ruckert, Conf. Proceedings of ASIC2003, Sep. 26-27, 2003, Rochester, N.Y., there is described an SoC architecture utilizing active switch boxes to connect processor cells for enabling packet network communications. This paper makes no mention or description of a processor core with multi-threading capability.
In the reference entitled “A Comparison of Five Different Multiprocessor SoC Bus Architectures” by Kyeong Keol Ryu, Eung Shin, and Vincent J. Mooney, Conf. proceedings of Euromicro Symposium on Digital System Design (DSS'01), Sep. 04-06, 2001, Warsaw, Poland, there is described Multiprocessor SoC bus architectures including Global Bus I Architecture (GBIA), Global Bus II Architecture (GBIIA), Bi-FIFO Bus Architecture (BFBA), Crossbar Switch Bus Architecture (CSBA), and CoreConnect Bus Architecture (CCBA).
The approaches based on a single embedded processor provide a cost-effective, integrated solution to some applications but may lack the computational power required by more demanding Applications, and flexibility for protocol conversion or future protocol speed increases, for example 2.5 Gbps Fibre Channel to 10 Gbps Fibre Channel.
Within the last few years, the computational capabilities of the SoC of FIG. 1 have been enhanced, in a number of networking applications, through the addition of special-purpose processor cores (accelerators) 39 attached to the common bus(PLB), as shown in FIG. 2, operating in parallel with the processor core 25). These additional special-purpose processor cores 39a, 39b, etc. are usually small in silicon area, as many of the features found in typical general-purpose processors (e.g., a memory management unit to support virtual addressing, etc.) are excluded. Examples of this approach are IBM's PowerNP (See for example, the reference entitled “IBM Power Network processor architecture,” Proceedings of Hot Chips 12, Palo Alto, Calif., USA, August 2000, IEEE Computer Society by M. Heddes, and NEC's TCP/IP offload engine, (See for example, the reference entitled “NEC's New TCP/IP Offload Engine Powered by 10 Tensilica Xtensa Processor Cores,” at http://www.tensilica.com/html/pr—2003—05—12.html). Although these systems are programmable and, consequently, more flexible compared to hardwired accelerators, they suffer from several drawbacks: a) they induce additional traffic on the SoC bus (e.g., PLB 21), as the bus must now support both instruction and data streams to the processor accelerators possibly causing bandwidth contention and limiting system performance; b) the SoC bus is often not optimized for multiprocessor performance but for compatibility with standardized components and connection protocols in a SoC system; and, c) the processor accelerators 39 often implement only a very limited instruction set and use assembler language, thus making the development and maintenance of applications running on the processor accelerators very difficult and costly.
A third type of SoC design 75 is an embedded processor core connected via a crossbar switch, such as Motorola's MPC 5554 Microcontroller (Design News, Nov. 3, 2003 page #38) a block diagram of which is depicted in FIG. 3. As illustrated in FIG. 3, Motorola's SoC design consists of many similar elements as the SoC designs of FIGS. 1 and 2 including a PowerPC processor core, memory and bus interfaces, however, more notably, implements a 3×5 Crossbar switch 72 as a replacement for one of the local buses. By incorporating a crossbar switch 72 into the SoC design, the processor core communications may occur faster, with three (3) lines working simultaneously, thereby addressing the bandwidth contention problems to some degree. However, the SoC is still not optimized for multiprocessor support, or more advanced functions like protocol conversion within a single SoC chip, or high speed interfaces. The I/O communication within the chip is limited by the crossbar switch, and still requires communication with the external bus interface and host system bus, limiting performance and flexibility of the micro controller (SoC chip) for any future upgrades. Any protocol conversion would be required to be performed off-chip, in several stages, or chips. In addition, a data packet can not be decoupled from instructions placed onto the host system bus. In the example in FIG. 3, one protocol, for example, the FlexCan (CAN protocol: “Control Area Network) data stream typical used in automotive applications, is now being implemented in the Motorola MPC5554 chip via an external I/O bridge 78, as well as other protocols such as the DSPI (or “Serial Perphical Interface”), or eSCI (“Enhanced Serial Communication Interface”), each protocol or I/O specific stream passes through an I/O bridge, crossbar switch, and typically either an internal chip bus, or external bus interface to the system bus.
Currently, there is no protocol conversion today within a single chip and no means for protocol conversion with an embedded core, attached to the internal chip bus, from one independent protocol or protocol version level to an entire new protocol or version level.
Current protocol conversion only takes place at the system, or card level, involving multiple chips as mentioned earlier, one example is the Brocade Silkworm Fabric Application Server mentioned earlier for SAN networks (See, for example, http://www.brocade.com/san/extending_valueof_SANsjsp) as shown in FIG. 4.
In the prior art Brocade system 100 depicted conceptually in FIG. 4, for example, Fibre Channel-to-Fibre Channel (FC-to-FC) Routing 102, iSCSI-to-FC bridging 104 and Fibre Channel to FC-IP translation 110 capability is provided. Brocade's design is an improvement over the existing art today, in that one fiber I/O port card can support multiple protocols and even migrate from one protocol to another on the same I/O card without disturbing traffic on the other ports within a system. This is accomplished by splitting the data and control frames in the processing function of packets, several in-line RISC processor chips with local memory and frame buffers, software pre-processors, and translations engines within the processor card. This is an improvement over standard single HBA cards, allowing two network protocols within a single HBA card, eliminating cost and space, flexibility of changing protocols without disturbing traffic on the main system bus, data transfer overhead and memory contention on the main system processor memory. The multiprocessors in Brocade's approach are fully pipelined, attached to local memory.
It would be highly desirable to incorporate this functionality within a single chip as opposed to a single HBA card or bridge card, enabling true protocol conversion within a single chip, processing the data and control frames within the protocol converter to deliver a complete packet to a local SoC bus, or system bus. This would enable further potential reduction of I/O cards, savings in hardware (number of chips), less bandwidth contention, memory contention, and enable higher protocol speeds, and more processors within a SoC chip (or attached to a local system bus), and higher throughput.