The present invention relates generally to processors, and, more particularly, to a multi-core processor for managing data packets in a communication network.
Communication networks including computer networks, telephone networks, and cellular networks are implemented using various technologies such as circuit-switching, packet-switching, and message-switching. Packet-switched networks are digital networks in which multiple digital systems such as gateways, switches, access points, and base stations communicate using data packets. The digital systems may include a single processor or a multi-core processor for processing the data packets. A multi-core processor includes two or more cores that distribute the received data packets among themselves for processing using a technique referred to as receive side-scaling. Receive side-scaling ensures efficient resource (core) utilization and the system can support a large load of data packets. The system further includes a memory and a hardware accelerator. The memory includes a data buffer that stores the data packets. The hardware accelerator assists the cores in processing the data packets by performing additional functions such as encryption, cryptography, pattern matching, and decoding of the data packets. Examples of the hardware accelerator include off-load accelerators such as cryptographic co-processors, compression accelerators, pattern-matching accelerators, encryption hardware accelerators and input/output (I/O) accelerators such as security encryption controllers, Ethernet controllers, and network-attached storage accelerators. The data packets may either be received over the digital network or generated by the digital system. The hardware accelerator and the cores communicate using buffer descriptor (BD) rings.
The BD rings are stored in the memory. Each BD ring stores a plurality of BDs in the form of an array. The BD holds a pointer to a data packet stored in the data buffer and describes status, size, and location of the data packet. The BD rings are of two types: a transmit BD ring and a receive BD ring. The transmit BD ring includes BDs corresponding to the data packets that are processed by the cores. The hardware accelerator polls the transmit BD ring to check for availability of such data packets, processes the data packets, and transmits the processed data packets either over the digital network or back to the cores for further processing. The receive BD ring includes BDs corresponding to the data packets that are received by the hardware accelerator over the digital network. The data packets are processed by the hardware accelerator and are transmitted to the cores for further processing. The receive BD ring is polled by the cores to check for availability of such data packets (which is typically done under a deferred processing context such as a tasklet).
The hardware accelerator stores the data packets in the data buffer and associates each data packet with a corresponding BD. When the BDs are ready, the hardware accelerator provides a hardware signal to an interrupt controller of the digital system. The interrupt controller includes a status table that stores the status of the multiple cores. The status of a core is determined based on whether the core has been assigned any data packets for processing. If the core is not assigned any data packets for processing, the status is marked as ‘idle’ and if the core is assigned data packets for processing, the status is marked as ‘busy’. The interrupt controller selects a core that has a corresponding status marked as ‘idle’. The interrupt controller also generates an interrupt signal upon receiving the hardware signal and transmits the interrupt signal to the selected core, to notify the selected core of the ready BDs. The selected core receives the interrupt signal and invokes an interrupt service routine (ISR) to service the interrupt.
The selected core selects a BD ring that includes the ready BDs and begins processing corresponding data packets. The selected core does not transfer the interrupt signal to another core until it has finished processing the data packets. Thus, the idle cores remain idle even though the data buffer may hold unprocessed data packets. When the selected core finishes processing the data packets, the interrupt controller generates and transfers a subsequent interrupt signal to the next idle core, which then commences processing the unprocessed data packets of the BD ring. The cores access the BD ring serially and there is no mechanism by which multiple idle cores can simultaneously process the data packets associated with the BD ring. Thus, the processing speed of the system is limited to the processing speed of a single core.
To overcome the aforementioned problem, a software-based packet-steering approach has been used in which the interrupt signal received from the hardware accelerator is transmitted to a single core. The core invokes an ISR to service the interrupt signal and receives the data packets and steers the data packets to backlog queues, i.e., ingress buffers of the other cores based on a classification of the received data packets. Thus, multiple cores can simultaneously process the data packets from the backlog queues. However, software-based packet-steering adopts a centralized work distribution mechanism, i.e., the data packets are distributed to the multiple cores by a single core causing the steering rate to be limited to the processing speed of the single core. Moreover, the classification and steering of the received data packets requires additional instruction cycles of the core, which leads to an increase in system cycle time.
In another packet-steering approach, the hardware accelerator applies a hash function on a set of bits (e.g., first four bits of IP header) of a received data packet and calculates a corresponding hash result. The hardware accelerator includes an indirection table that is used to direct the received data packet to a particular core based on the hash result. Each entry in the indirection table includes a hash result and a corresponding backlog queue of the core. The hardware accelerator identifies a core to process the received data packet based on the hash result and transmits the data packet to the backlog queue of the identified core. Thus, the received data packets are classified and steered directly to the backlog queues of the cores by the hardware accelerator. As the steering is not performed by a single core, the steering rate is not limited by the processing speed of the single core. However, the classification and steering mechanism requires additional hardware.
Therefore, it would be advantageous to have a multi-core digital system that has an efficient distributed work sharing mechanism.