Peripheral. Component Interconnect Express (PCI Express) is a low-cost, scalable, switched, point-to-point, serial input/output (IO) interconnection scheme that maintains backward compatibility with PCI. PCI Express provides a number of benefits over existing bus standards, including increased bandwidth availability and support for real-time data transfer services. The PCI Express architecture is specified using an Open System Interconnection (OSI) layer model and uses a load-store addressing architecture with a flat address space to allow interoperability with existing PCI applications. Software layers generate read and write requests that are transported by a transaction layer to IO devices using a packet-based, split-transaction protocol. A link layer adds sequence numbers and cyclic redundancy check (CRC) to these packets to create a highly reliable data transfer mechanism. A basic physical layer includes a dual simplex channel that is implemented as a transmit pair and a receive pair.
Some integrated circuits (ICs), such as programmable logic devices (PLDs), may be configured to include a circuit (a “core”) that provides a PCI Express bus interface (a “PCI Express core”). In a PCI Express core, transaction layer packets to be transmitted over a PCI Express bus are stored in a buffer memory. The packets may be read from the buffer memory in a different order than they were written, and each packet may be a different length. It is desirable to transmit the variable-length packets as a stream without gaps. Currently, to allow switching from one packet to the next without incurring a gap in the stream, a flag can be added to the next-to-last word of data in a packet to indicate that the next word is the last word. This can allow the read process time in which to make an end-of-packet determination and jump to the address of the next packet in the buffer memory. Such a technique has two limitations: First, the end-of-packet detection and new address determination must be made in a short period of time (e.g., if one data word is read per clock cycle, the read process must detect the end-of-packet and determine the new address in a single clock cycle). The buffer memory, however, can include a high latency, which makes meeting timing requirements difficult. Second, such a design cannot tolerate any pipeline stages following the buffer memory output, which prevents the use of an external buffer memory (e.g., external to the core).
Accordingly, there exists a need in the art for a method and apparatus for processing variable-length packets stored in a buffer memory for transmission that overcome the aforementioned disadvantages.