1. Field of the Invention
The present invention is related to the field of computer system architecture. More specifically, the present invention is related to a multitasking computer system architecture supporting multiple independent, specialized, loosely coupled processors. The architecture provides a novel approach to scheduling processes for execution on one of the multiple processors, migrating processes between the processors, rescheduling of processes upon a cache miss, and distributing memory along pipeline stages in the processors. The computer system architecture is particularly optimized for operations related to data packet switching as may be performed by an International Standards Organization (ISO) Open Systems Interconnection (OSI) Layer 2 (i.e., media access control sublayer--MAC) based network switching device, i.e., a switching hub, in a data communications network. The architecture is further applicable to operations related to routing as may be performed by an ISO Layer 3 (i.e., network layer) based network device.
2. Description of the Related Art
Introduction
In the prior art, a switching hub typically is designed around a computer system having a single processor. The computer system is controlled by software optimized to receive and transmit data packets between local or wide area network segments in a data communications network. As an optimized computer system, the prior art switching hub is generally comprised of the same components as a general purpose computer system, including a central programmable processor, an internal control bus, a data bus, and shared common memory controlled by the central programmable processor. Additionally, the prior art switching hub has a plurality of media access controllers (MACs), each having an associated port coupled to one of the local or wide area network segments.
A prior art switching hub may be further optimized, for example, by introducing memory subsystems and input/output (I/O) devices particularly suited for processing data packets. However, by definition, a general purpose computer system is not designed with a particular application, such as data packet switching, as the primary application. As a result, a switching hub based on a general purpose computer system generally does not fully utilize the capabilities of the computer system. Moreover, the maximum data packet processing throughput of the switching hub is limited by the general purpose computer system architecture. In general, in order for a particular application to be performed by a computer system as quickly, inexpensively and efficiently as reasonably possible, what is needed is a computer system architecture designed to optimize the operations performed by the computer system to carry out a particular application. In particular, what is needed is an improved computer system architecture that is designed to facilitate the extremely high data packet processing rates required by a high performance switching hub.
Overview of Switching Hub Functions vs. General Purpose Computer System Functions
A brief overview of some of the needs of and functions typically performed by a switching hub as opposed to the functions generally performed by a general purpose computer system will now be discussed. The overview serves to further identify the need for an improved computer system for use in a switching hub.
Latency and Throughput
A switching hub primarily performs data packet processing. A switching hub "switches" data packets from one network segment to another network segment. That is, the switching hub receives data packets on a port coupled to a network segment, internally processes the data packet, and transmits the data packet out a port coupled to a different network segment. Data packet processing is very I/O intensive relative to the processing performed by a general purpose computer system. A switching hub may process data packets at very high rates. At these high rates, there is no long or medium term temporal locality of data because all the data (in the form of data packets) enters the switching hub and shortly thereafter leaves the hub. Furthermore, data packets received by the hub are generally independent of each other. Thus, traditional parallel processing techniques are more readily applied to data packet processing within the hub.
The volume of data packets switched by a switching hub is a very important factor when considering the performance of the hub. However, the time required to process a particular data packet is not as critical. In other words, latency, i.e., the delay in switching a data packet, is not so important a consideration as overall data packet throughput. This factor, combined with the fact that data packets are generally independent of each other, means it is not so important what task is being performed by the switching hub so long as that at least some task is being performed at any given time.
However, a primary goal in the design of a general purpose computer system is to reduce instruction latency. To this end, a computer system uses well known pipeline techniques in an attempt to reduce the clock cycle time and thereby improve throughput. When using these techniques, each instruction executed by a general purpose computer system generally requires the results from the immediately preceding instruction. As a result, such systems typically incorporate a bypassing or feedforward technique, in which an instruction at a stage in the pipeline receives its arguments sooner than it would otherwise. However, introducing these techniques adds stages to the pipeline. While adding stages allows for a decrease in the clock cycle time of the system, introduction of the bypass logic requires the clock cycle time to be increased to allow time for the bypass logic to operate. What is needed is a computer architecture where each instruction in a pipeline is being executed for a different independent process so that the instructions do not depend on the preceding one or more instructions in the pipeline, thus obviating the need for bypass logic and allowing the ability to provide simpler and deeper, i.e., longer pipelines.
Temporal Locality
Most general purpose computer systems assume data have several properties, including temporal and spatial locality. Temporal locality refers to the notion that once a data item is accessed, it will generally be accessed again relatively soon. Thus, most general purpose computer systems have general registers which provide an extremely fast (and small) cache for recently accessed, i.e., important, data. Indeed, most general purpose computer systems require specific instructions to load data into or store data maintained in these general registers. While there is overhead associated with performing a load or store instruction, the overhead is minimal in most computing environments. In a switching hub environment in which data packet processing is the primary function, the load and store instructions can comprise a large percentage of the overall instruction stream in many cases. What is needed, then, is a means by which the need to load and store data in a general register is eliminated.
Temporal locality strongly influences the design of cache memory in most general purpose computer systems. As the size of the cache grows, it effectively increases the time scale for temporal locality. Larger caches allow the general purpose computer system to retain recently used data for a longer period of time in the relatively faster cache, thereby improving the average speed of retrieving and processing the data. However, in a switching hub environment, there is generally no long term temporal locality for a data packet, because once the data packet has been processed, it is no longer of interest. Thus, relatively large caches generally do not increase the performance of switching hubs. What is needed is an improved architecture for use in a switching hub in which a small cache is provided.
Spatial Locality
As applied to general purpose computer systems, spatial locality refers to the notion that if a data item is accessed in memory, there is a strong correlation that other data stored nearby in memory will also be accessed. Based on this concept, computer system architectures have been designed with large cache lines and provide for "move multiple" instructions. Prior art computer system architectures have been limited to ensure cache lines are not too large because there is a point at which larger cache lines not only increase hit rates (i.e., the rate a which data is found in the cache) but also increase miss penalties (i.e., the costs associated with not finding the data in the cache). In a switching hub environment, however, there generally is a even stronger relationship between nearby data items. Thus, what is needed is a computer system architecture for use in a switching hub in which larger cache lines are utilized than would be reasonable in most general purpose computer system architectures.
Program Size
The majority of general purpose computer system application programs require extremely large text spaces. In other words, the current application programs provided for prior art general purpose computers are large and generally getting larger with each new release or revision. Indeed, some commercially available programs comprise millions of lines of "code" or instructions. Thus, the computer architect must provide an extremely large memory address space in which to store the instructions to be executed.
It is well known that in addition to data, instructions have temporal and spatial locality. An instruction cache is generally utilized rather than providing sufficient very fast memory. A data packet processing application program, on the other hand, generally comprises a relatively smaller number of instructions, typically in the thousands. Further, an instruction cache is generally of no use in a data packet processing environment unless the cache is large enough to hold an entire program because each data packet received by a switching hub is typically processed differently than a previously received data packet, especially in a multiprotocol, heterogeneous data communications networking environment. Thus, what is needed is a computer system architecture optimized for a data packet processing application program as may be utilized by a switching hub, in which an entire data packet processing application program can reside within a single memory device such as a static random access memory (SRAM) device which is sufficiently fast in fetching the instructions comprising the data packet processing application program.
Performing Lookup Operations
In a switching hub, one of the most common operations of the system is to look up data in a table. For example, in mapping a MAC address to a source or destination port number, a switching hub uses the MAC address as a key to index into a table of port numbers. It can be appreciated that, depending on the size of the network and the number of end stations coupled to the network in which the switching hub is installed, the size of the example table could be very large, containing conceivably hundreds of thousands of entries or records. Due to the frequency of the operation, and the possible number and size of tables in the switching hub, looking up, i.e., searching for, an entry in a table is a critically important function. Thus, what is needed is a method of searching for an entry in a table by which even the smallest delay in performing the search is minimized to avoid the compounding effect of a delay when repeatedly performing the search operation.