The present invention generally relates to data processing systems, and, more particularly, to a multi-core data processing system for managing data packets in a communication network.
Communication networks including computer networks, telephone networks, and cellular networks are implemented using various technologies such as circuit-switching, packet-switching, and message-switching. Packet-switched networks are digital networks in which multiple data processing systems such as gateways, switches, access points, and base stations communicate by way of data packets. A data processing system may include a single- or multi-core processor. In a multi-core processor, two or more cores are used to distribute the received data packets among themselves for processing. The multiple cores execute a large number of user space and kernel space threads (collectively referred to as “threads”) for processing the received data packets. Examples of user space threads include application software and file system driver threads, and examples of kernel space threads include operating system (OS) threads and physical device driver threads.
Each OS includes a scheduler that manages the utilization of the data processing system resources such as processor time, communication bandwidth, and buffer descriptor (BD) rings for processing the threads. The scheduler may be a pre-emptive scheduler or a co-operative scheduler. A pre-emptive scheduler interrupts a thread being executed by a core (also referred to as a “running thread”) and schedules-in an alternate thread for processing by the core, whereas a co-operative scheduler does not schedule-in the thread for processing by the core until execution of the running thread is completed.
The data processing system further includes a memory and a co-processor. The memory includes a data buffer that stores the data packets. The co-processor assists the cores in processing the data packets by performing additional functions such as encryption, decryption, pattern matching, and decoding of the data packets, thereby accelerating packet processing. Examples of the co-processors include cryptographic co-processors, compression accelerators, pattern-matching accelerators, encryption hardware accelerators and input/output (I/O) accelerators such as security encryption controllers, Ethernet controllers, and network-attached storage (NAS) accelerators. The data packets may either be received over the communication network or generated by the data processing system.
The co-processor and the cores communicate by way of the BD rings. The BD rings are stored in the memory. Each BD ring includes a plurality of BDs in the form of an array. A BD holds a pointer to a data packet stored in the data buffer and describes status, size, and location of the data packet in the memory. The BD rings are of two types: transmit BD rings and receive BD rings. A transmit BD ring includes BDs corresponding to the data packets that are processed by the cores. The co-processor polls the transmit BD ring to check for availability of such data packets, processes the data packets, and transmits the processed data packets either over the digital network or back to the cores for further processing. The receive BD ring includes BDs corresponding to the data packets received by the co-processor over the communication network. These data packets are processed by the co-processor and transmitted to the cores for further processing. The receive BD ring is polled by the cores to check for availability of data packets (which is typically done under a deferred processing thread such as a tasklet).
Each thread being executed by the processor requires at least one BD ring to access the co-processor. Generally, the number of BD rings of the co-processor is limited, which in turn limits the number of threads that can access the co-processor. Thus, it is desirable to scale the number of BD rings. However, as the polling logic in the co-processor must poll all the BD rings to retrieve corresponding data packets therefrom, the polling logic increases and becomes cumbersome. Further, polling a large number of BD rings increases the machine cycles of the processor and the co-processor. Thus, scaling the number of BD rings using the polling logic is not an efficient solution.
Another known technique to overcome the problem of a limited number of BD rings is sharing of a single BD ring by multiple threads. Thread synchronization (also referred to as “process synchronization” or “serialization”), i.e., a mechanism to ensure that the multiple threads being executed by multiple cores are coherent, is achieved using locks. A lock is a thread synchronization mechanism by which a set of threads is executed simultaneously and the set of threads shares the BD rings. Locks ensure that the multiple threads do not access a shared BD ring simultaneously. For example, if first and second BD rings are available for communication between the processor and the co-processor, then first and second sets of threads are associated with the first and second BD rings. Thus, the threads of the first set of threads share the first BD ring and the threads of the second set of threads share the second BD ring. When a thread of the first set accesses the first BD ring, the thread acquires a lock to the first BD ring. As the thread locks the first BD ring, no other threads of the first set is allowed access to the first BD ring until the running thread (the thread that owns the lock) is executed to completion. When the running thread is executed completely, it releases the lock and allows another thread of the first set to access the first BD ring.
However, in this thread synchronization mechanism, the threads other than the running thread have to wait to access the first BD ring until the lock is released. Further, if the running thread enters an infinite loop, the other threads will have to wait infinitely, resulting in a stalled operation of not only the core executing the thread, but also the co-processor. In addition to the existing resources, additional resources such as memory space for the locks, processing bandwidth for initialization and destruction of the locks, and the time required for acquiring and releasing of the locks add overhead (also referred to as “lock overhead”). Moreover, there is a possibility of a thread attempting to acquire a lock that is being held by another thread. Such a condition is referred to as “lock contention”. A deadlock situation may also be reached when two threads wait to acquire a lock while a third thread holds the lock.
It would be advantageous to have a data processing system that processes multiple threads without the limitation of the number of BD rings, that is free of lock contention, lock overhead, and deadlock, and that overcomes the above-mentioned limitations of conventional data processing systems.