The well-known Internet network is a notoriously well-known publicly-accessible communication network at the time of filing the present patent application, and arguably the most robust information and communication source ever made available. The Internet is used as a prime example in the present application of a data-packet-network which will benefit from the apparatus and methods taught in the present patent application, but is just one such network, following a particular standardized protocol. As is also very well known, the Internet (and related networks) is always a work in progress. That is, many researchers and developers are competing at all times to provide new and better apparatus and methods, including software, for enhancing the operation of such networks.
In general the most sought-after improvements in data packet networks are those that provide higher speed in routing (more packets per unit time) and better reliability and fidelity in messaging. What are generally needed are router apparatus and methods increasing the rates at which packets may be processed in a router.
As is well-known in the art, packet routers are computerized machines wherein data packets are received at any one or more of typically multiple ports, processed in some fashion, and sent out at the same or other ports of the router to continue on to downstream destinations. As an example of such computerized operations, keeping in mind that the Internet is a vast interconnected network of individual routers, individual routers have to keep track of which external routers to which they are connected by communication ports, and of which of alternate routes through the network are the best routes for incoming packets. Individual routers must also accomplish flow accounting, with a flow generally meaning a stream of packets with a common source and end destination. A general desire is that individual flows follow a common path. The skilled artisan will be aware of many such requirements for computerized processing.
Typically a router in the Internet network will have one or more Central Processing Units (CPUs) as dedicated microprocessors for accomplishing the many computing tasks required. In the current art at the time of the present application, these are single-streaming processors; that is, each processor is capable of processing a single stream of instructions. In some cases developers are applying multiprocessor technology to such routing operations. The present inventors have been involved for some time in development of dynamic multistreaming (DMS) processors, which processors are capable of simultaneously processing multiple instruction streams. One preferred application for such processors is in the processing of packets in packet networks like the Internet.
In the provisional patent application listed in the Cross-Reference to Related Documents above there are descriptions and drawings for a preferred architecture for DMS application to packet processing. One of the functional areas in that architecture is a generic queue and related methods and circuitry, comprising a queuing system.
A processing core of a multi-streaming processor has functional hardware units provided therein for computation. Examples include multipliers, dividers, adders (also capable of subtraction), and other more specialized units dealing with higher-level computation. It is desired that resources allocated to the processing of data packets be utilized such that functional units of those resources are not singularly or in combination over-or under-utilized. That is, that the pressure the units are under in terms of request of service from the processing unit, termed herein the Streaming Processing Unit (SPU) should, optimally, be balanced over the lot of resources.
Referring now to cross-referenced application Ser. No. 09/737,375, there is disclosed under the heading Context States, hardware units responsible for packet management, context selection, and packet processing. These units are the PMU (packet management unit), the SPU (streaming processor unit), and the RTU (register transfer unit). The RTU is considered part of the PMU and the SPU core actually processes data packets utilizing multi-streaming technology. A context, as was described, can be in one of two states: PMU-owned or SPU owned. If a context is PMU-owned it means that no stream is running on it (stalled or not). It is then a candidate for the PMU to preload information of a packet for processing. If it is SPU-owned, a stream is actively processing packet information. In the case at hand, the prime examples of the invention pertain to a Dynamic Multi-Streaming (DMS) Processor having eight streams. Typically in this processor one or more contexts are SPU owned while the rest are PMU-owned, the optimal case being the one in which all the functional units of the SPU are maximally utilized.
One of the challenges to processing data packets at high speeds is to be able to implement functional resources within a processing core using less real estate (silicon/circuitry) than is typically used. Another challenge, at least in multi-streaming processors, is how to optimize (speed up) parallel processing of multiple data packets from separate packet flows while sharing resources in a processing core.
In a DMS (Dynamic Multi-Streaming Processor) known to the inventor, available functional resources on the processor core are organized into clusters, each cluster having 4 functional units, 4 contexts, and capable of supporting 4 simultaneous instructional threads from SPU. In processing, the PMU pre-loads packet information via a Register Transfer Unit (RTU) into one context in one of the clusters, so that the SPU may process the information. It is important to note herein that the exact number of contexts as well as clusters and functional units depends on design and hardware considerations and is not, by any means, fixed.
In general, disclosure under the headings Pre-loading a Context and Selecting a PMU-Owned Context within priority document Ser. No. 09/737,375 describe the processes of pre-loading contexts at boot, selecting available contexts during processing, and pre-loading packet information before SPU processing. Referring now to a disclosed table inserted under the heading Selecting a PMU-Owned Context, context selection is performed by the RTU according to algorithm supported by a truth table. In a case where more than one context is available for selection, a priority scheme is used to make the appropriate selection. The method disclosed, in conjunction with the known limitations regarding use of clusters and contexts enables a somewhat better utilization of the functional resources provided within the clusters.
It has occurred to the inventor that further improvements to the context selection method discussed in Ser. No. 09/737,375 are required in order to further optimize the use of functional resources within the processing core. The present specification addresses such improvements.