Computer systems can communicate with each other using a variety of networks, for example, Internet Protocol (IP) network and a Synchronous Optical Network (SONET). SONET is the United States standard for synchronous data transmission on optical media. The international equivalent of SONET is synchronous digital hierarchy (SDH). Together, they ensure standards so that digital networks can interconnect internationally and that existing conventional transmission systems can take advantage of optical media.
FIG. 1 illustrates a conventional architecture of a line card, used in a network communication device that includes a link layer device and a framer. The link layer device typically includes components such as a network processor, a network co-processor, memory, datapath switching element (DSE), network search engine (NSE), and a clock management block. The network processor and/or a framer usually performs packet processing functions. Packet processing functions may involve tasks such as packet pre-classification or classification, protocol conversion, quality of service assurance, service policing, provisioning, and subscriber management functions. The framer is used to transport data such as ATM (asynchronous-transfer-mode) cells, IP packets, and newer protocols, such as GFP (generic framing procedure) over SONET (synchronous optical network)/SDH (synchronous packet processing system hierarchy) links. On the line side, the framer may support optical-networking protocols for both SONET/SDH and direct data-over-fiber networks. The framer is coupled to the physical layer port such as a SONET device, which is coupled to a network medium such as optics. On the system side, the framer interfaces to the link-layer device usually through standard buses, for example, the Universal Test and Operation Physical interface device for ATM (UTOPIA) or Packet Over SONET-physical layer (POS-PHY) buses. In a high-speed design as the system operating speed increases it is common design practice to expand the internal bus width. An exemplary 10G (10 gigabits per second) system may have a 128-bit wide bus.
Data may be transported across a network as discrete elements called packets. The System Packet Interface Level 5 (SPI-5) is a commonly used packet interface for high bandwidth applications. This interface is governed by the SPI-5 (OC-768 System Packet Interface) standard issued by the Optical Internetworking Forum (OIF).
In a conventional packet interface, packets may be transported across multiple communication channels of data and may require store and forwarding operations. A conventional packet interface for a multi-channel, high bandwidth application is shown in FIG. 2. This packet interface includes a protocol parser block, a communication channel data extraction block, and a protocol encapsulation and framer block. The protocol parser block takes the input from the SPI-5 interface as a 16-bit wide bus. The input data stream is typically ‘jumbled’, with packet fragments of data from different communication channels mixed together in the time domain. A conventional interface includes a serial-in-parallel-out (SIPO) block in the parser, between the SPI-5 bus and the data assembler. This block converts the 16 bits to a wider bus (e.g., 32 bytes) regardless of the communication channel boundaries. The purpose of such SIPO operation is to reduce the frequency of operation to an appropriate level. The input data stream may be ‘jumbled’ because of this reason. The communication channel data extraction block functions to extract and assemble the data from each communication channel into the appropriate communication channels without losing any data. The output data has a fixed width of data per communication channel.
When back-to-back packet fragments for different communication channels coming through an SPI-5 interface are stored, memories corresponding to each communication channel need to be updated as they are received. When implementing this operation for very high speed links (for example 40G, or 40 Gigabits per second), it is possible that the input data bus to the memory system is quite wide (for example 32 bytes, or 256 bits) to accommodate many communication channels per cycle. Thus the storing process may require multiple writes per cycle.
For each communication channel, there exists a separate memory or first-in-first-out (FIFO) memory. This allows a multi-port write, with the number of possible write ports equal to the number of communication channels. A write operation is performed into each FIFO when a word gets accumulated in the corresponding data assembler. A word is shown as 256 bits in FIG. 2, which is the same width as the data path. In one configuration, the width of the data assembler may be about twice the data path width, assuming the data assembler writes to the FIFO only. In a second configuration where both the input bus and the data assembler may write into the FIFO, the width of the data assembler may equal to the data path width. However, the second configuration is likely to have timing problems.
FIG. 3 illustrates a conventional data assembler including a first-in-first-out (FIFO) memory. FIFO memories are commonly used to implement packet processing functions. Referring to the conventional FIFO shown in FIG. 3, the write pointer or read-pointer always increments by one whenever a write or read occurs. Even if a packet fragment is involved in the data transfer, the rest of the bytes in the memory location referred to by that pointer remain unused. This is disadvantageous because it leads to an inefficient use of memory and resources. In addition, there is a bandwidth loss when a read operation is performed, since it is expected to output at a full data rate. It is expected that one data path width of data will be read every clock cycle. Alternately, bandwidth loss can be compensated for by increased frequency of operation.
FIG. 4 shows a conventional implementation of a data extraction block of FIG. 2. In this implementation, it might be possible to optimize the number of write ports to be the maximum number of communication channels contained in each input word. The maximum number of communication channels contained in each input word is likely to be less than the total number of communication channels. FIG. 4 shows the case where the number of port writes is equal to the number of communication channels, and where each communication channel has a FIFO of its own. In another implementation, the number of port writes may be the same as the max number of communication channels contained in the input word, and where the data assembler size can be W.
The conventional interface of FIG. 4 has a per-channel data extraction block, a data assembler block for each communication channel, a FIFO for each communication channel and a read control logic block. The width of ‘2W’ is important in this implementation, assuming that the data assembler writes to the FIFO only. Assuming the data assembler is W−1 bytes full and there is no end-of-packet (EOP) contained in these W−1 bytes, then the next packet fragment can be W bytes. The data assembler has to accommodate these W bytes before it can write into the memory. To achieve this the data assembler should be able to store W−1+W bytes=2W−1 bytes, thus the width requirement is a minimum of 2W−1 bytes.
If the input bus were permitted to directly write into the FIFO separately from the data assembler, then the max width requirement would be W−1 bytes. However, this implementation would involve multiplexing (muxing) bytes from the data assembler as well as the input. It is likely that such an implementation may have timing problems. With the alternate implementation, the minimum width requirement is 2W−1. However it is described as 2W bytes so the read from the FIFO is simplified. The movement of the read pointers inside the data assembler can be only at the W byte boundaries. If the minimum depth is 2W−1, then the read pointers have to move at the byte boundary.
While the conventional technology is relatively simple to implement, the disadvantages include that the conventional technology will require significant memory resources since each memory will have to be able to accommodate the worst case burst and the highest communication channel bandwidth. As a result, the conventional methods will require significant die area and resources to implement. More resources may be required to support overheads due to the use of smaller memories and more routes.