Deep packet inspection (DPI) provides an inspection and filtering function applied to received data packets for providing security, load balancing and application optimization. DPI examines the header and data portion of a packet (usually Layer 4 through Layer 7) searching for specific or illegal statements or data to determine whether the data packet should be forwarded and/or the policies to be applied (e.g., allow/deny, load balance, encrypt, etc.). The application or functionality of DPI may also be applied to identify flows instead of packet by packet analysis.
Prior art DPI switches included multiple data processing cores with attached local memory in a distributed environment with a shared backplane and used a load-balancing algorithm for distributing incoming traffic flows. Load-balancing was performed in software by the processing data core(s). This consumed significant and valuable processing power, added latency, and increased connection bandwidth due to an added hop between processing cores. In addition, this architecture was not readily scaleable.
One possible solution to provide scalability is to have a global flow manager which assigns every flow to one data plane CPU—based on some criteria such as the current load on the data plane CPUs. When a data plane CPU receives a packet which does not have an associated session, the packet is directed to the global flow manager. The global flow manager becomes the central clearing house for managing flows and performs load-balancing and offloading of sessions on demand. The problem with this architecture is that the global flow manager may become a bottle neck and multiple packet exchanges between the global flow manager and the data plane CPUs increases backplane traffic.
Another possible solution is to process the packets at the ingress module without load-balancing. Though this reduces backplane extra hops, it causes uneven loading based on traffic patterns, which results in wasted CPU and memory resources in some modules while other modules are heavily burdened. This is also not practical when route changes occur or multiple routes exist to the same destination. This may result in the forward and reverse flow processing occurring in different data plane CPUs which is incorrect or undesirable for many applications.
Accordingly, there is needed a method and architecture for a multi-application switch that provides scalability, load balancing, a reduction in CPU processing, and optimization of connection bandwidth overhead in the processing of data packets.