Situations often arise whereby functions must be performed on a continuous stream of data. If the functions are implemented in software on a processor, then each datagram (packet of data) which arrives in sequence from the stream must be stored, processed and then forwarded. This process will take some finite quantity of time to execute. As the rate of packet arrival increases there will come a point at which a single processor can no longer keep up. The function must then either be distributed across multiple processors arranged in a pipeline, or across multiple processors arranged in parallel—each receiving a packet from the stream in turn in some round robin sequence. Packets output from parallel processors are typically reordered before forwarding.
This is a well proven approach to high performance packet processing, but is limited in its scalability as the number of processors increases. Access to shared memories, be it for code or data, eventually becomes a bottleneck. Simultaneous R/W access to shared state will further add to the complexity of system control signalling in order to resolve contention.
This leaves the issue of high speed access to multiple items of shared state information by multiple parallel processors. As the number of processors and the complexity of their algorithms increases, address and data bandwidth requirement over the system bus to the shared data will also increase. This can then become a bottleneck. The State Element technology described later in this specification supports parallel processing systems by localising and managing serialisation to shared state.
A good case in point is the challenge of Traffic Management in network routers. A significant, recognised issue in per-flow Traffic Handling is that a number of items of state need to be maintained for each of a large number of queues. The implications of this are that: (a) a considerable volume of shared memory needs to be implemented; (b) a lot of memory address bandwidth is required if each queue requires separate accesses to be made to different (shared) state variables; and (c) the memory access latency is likely to be long, thus causing state blocking during modification to impact on performance.
Contention for shared state variables can be resolved by implementing state elements as described later. However, the state element concept in high performance systems is not a solution in itself. For maximum throughput and flexibility, a number of state elements are combined in a state engine. This allows multiple concurrent access to the shared state. The present invention aims to overcome the following problems:                1. Processors in parallel can create a high rate of access to the same item of state.        2. What happens if a given function needs to access multiple variables from the same address, ie needs to access and process a state record?        3. What if multiple functions executing in a processor on a single datagram each requires access to different, independently addressable tables of state variables or records?        
In short, the fundamental problem being addressed is that of a high rate of state access. This problem must be solved in a flexible way which enables the easy scaling of both the quantity of state being stored and the rate of state access.